SQL Server 2005 Error: 701, Severity: 17, State: 193
Here’s another interesting example which we worked recently in the Stockholm office, which relates also back to one of my previous entries. We had a customer whose error log was filled with a variety of critical errors such as these:
Error: 701, Severity: 17, State: 193. There is insufficient system memory to run this query.
Error: 1204, Severity: 19, State: 2. The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.
Error: 17803, Severity: 20, State: 13. There was a memory allocation failure during connection establishment. Reduce nonessential memory load, or increase system memory. The connection has been closed. [CLIENT: <local machine>]
This was a fairly high end system with 64GB RAM, 16 logical CPUs and was generally well configured in terms of its Max Server Memory and other basic sp_configure settings. As you may well be aware, there are innumerable causes that can actually cause you to receive a memory related error such as this, and some of the more basic troubleshooting tips are shown in the following MSDN article
https://msdn.microsoft.com/en-us/library/aa337311(SQL.90).aspx
However in this case none of the basics were really of any help. Typical starting questions to ask yourself in this scenario are:
1. Is there a pattern to when this occurs
2. Do I know of any particular workloads which are running when I see this problem
…just to understand if you have any particular potential causes in your own code. Our customer in this scenario was not aware of any, and unfortunately we didn’t have any dumps created which might allow us to see retrospectively what batches were running when the problem occurred.
However we did have the partial output of from DBCC MEMORYSTATUS which SQL had kindly written to the error log for us. An examination of this showed the following suspicious looking section:
OBJECTSTORE_LOCK_MANAGER (node 2)
VM Reserved = 0 KB
VM Committed = 0 KB
AWE Allocated = 0 KB
SM Reserved = 0 KB
SM Committed = 0 KB
SinglePage Allocator = 29398856 KB
MultiPage Allocator = 0 KBGlobal Memory Objects:
Resource= 709
Locks= 3676149
SE Schema Mgr= 1646
SQLCache= 261
Replication= 2
ServerGlobal= 109
XPGlobal= 2
Xact= 101
SETLS= 16
DatasetMemObjs= 32
SubpDescPmos= 16
SortTables= 2
From this we can infer that we seemed to experience an equivalent problem to the one I wrote about here
although manifesting distinctly different symptoms.
At the end of the day one of the NUMA nodes has taken up about 29GB of memory just managing lock resources, and this is confirmed by the number of locks which are reported in the global memory objects. Remember that the majority of the output from DBCC MEMORYSTATUS is publically documented here:
How to use the DBCC MEMORYSTATUS command to monitor memory usage on SQL Server 2005
https://support.microsoft.com/kb/907877
This 29GB of memory purely managing locks appears to be placing an undue load of the memory resources of the system, and what we appear to have here is a runaway or huge batch of some type, or maybe a very large open transaction.
The way to resolve this problem is to identify the batch and stop it from running. One good way to identify this problem proactively might be to use the tiny DMV query I wrote in my previous post about this type of situation and attach a monitor to it, warning you if the value goes above 1 million say, or some other high number of your choice.
select request_session_id, count(*)
from sys.dm_tran_locks
group by request_session_id
You could then easily identify the problem session_id before it got completely out of hand. There are of course many different alternative ways of capturing such data or doing such monitoring. Maybe you’d prefer to create a custom SQLDIAG package to monitor your server in more detail.
The moral of this story being though, that even in the most critical system failures such as this, there may well be enough info to get a hint of where the problem lies, if you make sure you go through the error logs in detail. Remember not to be overwhelmed by the sheer volume of errors that might be reported; ensure you go through all the lines removing the duplicates to see whether something of interest is hidden amongst all the noise.
Comments
- Anonymous
June 02, 2009
PingBack from http://www.anith.com/?p=43805