I had blogged about debugging deadlocked schedulers earlier for SQL Server 2008 R2 and below releases. Since there have been some fundamental changes in SQL Server 2012, I thought it would be a good idea to post about debugging the same scenario on SQL Server 2012 instances.
Older posts:
Debugging Deadlocked Schedulers Part 1
Debugging Deadlocked Schedulers Part 2
The Errorlog shows the following for a deadlocked schedulers issue reported on the SQL Server instance:
2012-12-01 02:50:36.69 Server ***Stack Dump being sent to C:\<path>\LOG\SQLDump0002.txt
2012-12-01 02:50:36.69 Server * BEGIN STACK DUMP:
2012-12-01 02:50:36.69 Server * 12/01/12 02:50:36 spid 7824
2012-12-01 02:50:36.69 Server * Deadlocked Schedulers
….
….
2012-12-01 02:50:42.86 Server New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 300 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the “max worker threads” configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0%. System Idle: 88%.
The first thing that we would need to do is load the memory dump (mini-dump) using Windows Debugger and then align the symbols using the following commands:
.sympath srv*F:\debugging sql server\publicSymbols*http://msdl.microsoft.com/download/symbols/
.reload /f
Post that you can verify that the symbols for SQL Server have been loaded correctly.
0:006> lmvm sqlservr
start end module name
000007f7`a08d0000 000007f7`a090e000 sqlservr (pdb symbols) f:\debugging sql server\publicsymbols\sqlservr.pdb\FBC1DEBA96C44E7484B3B43824BD5C512\sqlservr.pdb
Loaded symbol image file: sqlservr.exe
Mapped memory image file: C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\sqlservr.exe
Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\sqlservr.exe
Image name: sqlservr.exe
File version: 2011.110.3000.0
Product version: 11.0.3000.0
ProductName: Microsoft SQL Server
InternalName: SQLSERVR
Let us assume here that I do not know the reason why the deadlocked schedulers issue was reported. So I will use the ~* command to find out the number of threads in the memory dump file that I am inspecting. A screenshot of the output is shown below.
Then I use the !uniqstack command to get the number of unique stacks in the memory dump file. As you can see from the output below, that there are 655 threads out of which 615 are duplicate.
Total threads: 655
Duplicate callstacks: 615 (windbg thread #s follow):
2, 5, 7, 9, 11, 13, 15, 17, 24, 25, 30, 33, 34, 35, 36, 37, 38, 43, 44, 45, 46,
…
…
642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654
So now it’s time to inspect a few of the duplicate threads. Starting with Thread ID 643, I see the following callstack using the commands
~643s
kL100
ntdll!NtSignalAndWaitForSingleObject+0xa
KERNELBASE!SignalObjectAndWait+0x82
sqldk!SOS_Scheduler::Switch+0xb5
sqldk!SOS_Scheduler::SuspendNonPreemptive+0xd2
sqlmin!EventInternal<SuspendQueueSLock>::Wait+0x1ee
sqlmin!LockOwner::Sleep+0x495
sqlmin!lck_lockInternal+0xe9b
sqlmin!MDL::LockObjectLocal+0x446
sqlmin!SMD::LockObjectAndCheckVersion+0x86
sqllang!CRangeObject::XretSchemaChanged+0x3f0
sqllang!CRangeTable::XretSchemaChanged+0x1f
sqllang!CEnvCollection::XretSchemaChanged+0xe1
sqllang!CXStmtQuery::XretSchemaChanged+0x11c
sqllang!CXStmtSelect::XretExecute+0x1be
sqllang!CMsqlExecContext::ExecuteStmts<1,1>+0x400
sqllang!CMsqlExecContext::FExecute+0xa33
sqllang!CSQLSource::Execute+0x866
sqllang!process_request+0x73c
sqllang!process_commands+0x51c
sqldk!SOS_Task::Param::Execute+0x21e
sqldk!SOS_Scheduler::RunTask+0xa8
sqldk!SOS_Scheduler::ProcessTasks+0x29a
sqldk!SchedulerManager::WorkerEntryPoint+0x261
sqldk!SystemThread::RunWorker+0x8f
sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8
sqldk!SchedulerManager::ThreadEntryPoint+0x236
kernel32!BaseThreadInitThunk+0x1a
ntdll!RtlUserThreadStart+0x1d
If you look at the callstack mentioned above, you will notice that some of the module names are different. So the approach that you used to debug your SQL Server 2008 R2 and below release memory dumps would not work here. Looking into the other call stacks, I find a similar pattern. This tells me that most of the threads are waiting on locks (note the frames highlighted in red above).
Since I now know that this is a blocking issue, I will attempt to find the total number of callstacks which have the lock waits.
Command: !findstack sqlmin!LockOwner::Sleep 2
Note that the module name is different here! The output of the above command will be similar to the snippet shown below.
Thread 653, 1 frame(s) match
00 ntdll!NtSignalAndWaitForSingleObject+0xa
01 KERNELBASE!SignalObjectAndWait+0x82
02 sqldk!SOS_Scheduler::Switch+0xb5
03 sqldk!SOS_Scheduler::SuspendNonPreemptive+0xd2
04 sqlmin!EventInternal<SuspendQueueSLock>::Wait+0x1ee
* 05 sqlmin!LockOwner::Sleep+0x495
06 sqlmin!lck_lockInternal+0xe9b
07 sqlmin!MDL::LockObjectLocal+0x446
08 sqlmin!SMD::LockObjectAndCheckVersion+0x86
09 sqllang!CRangeObject::XretSchemaChanged+0x3f0
10 sqllang!CRangeTable::XretSchemaChanged+0x1f
11 sqllang!CEnvCollection::XretSchemaChanged+0xe1
12 sqllang!CXStmtQuery::XretSchemaChanged+0x11c
13 sqllang!CXStmtSelect::XretExecute+0x1be
14 sqllang!CMsqlExecContext::ExecuteStmts<1,1>+0x400
15 sqllang!CMsqlExecContext::FExecute+0xa33
16 sqllang!CSQLSource::Execute+0x866
17 sqllang!process_request+0x73c
18 sqllang!process_commands+0x51c
19 sqldk!SOS_Task::Param::Execute+0x21e
20 sqldk!SOS_Scheduler::RunTask+0xa8
21 sqldk!SOS_Scheduler::ProcessTasks+0x29a
22 sqldk!SchedulerManager::WorkerEntryPoint+0x261
23 sqldk!SystemThread::RunWorker+0x8f
24 sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8
25 sqldk!SchedulerManager::ThreadEntryPoint+0x236
26 kernel32!BaseThreadInitThunk+0x1a
27 ntdll!RtlUserThreadStart+0x1d
Using the !findstack command, I found that the number of such threads are 585 which is a whopping 89% of the total threads!!
Since we are looking at a SQL Server 2012 issue, we have the System Health Session default session logs which run by default on any SQL Server 2012 instance. If you have the .xel files available in the LOG folder, then you can use these files to dig into the blocking chains which would have been captured in the archived .xel files. I have documented on how to do this in a previous blog post of mine.
Also check if the Deadlocked Schedulers matches any KB Article (update) already released. Remember that deadlocked schedulers is a symptom and the not the cause!! So make sure that if you find a KB Article which talks about a Deadlocked Schedulers issue being addressed in a later build, the symptoms match the symptoms that you are experiencing on your server! If in doubt, ask CSS!
If you are stuck and have no clue why the issue happened, then this would be a good time to engage Microsoft CSS Support and provide them with the SQL Errorlogs, the dump file, the archive .xel files in the LOG folder and the windows event logs in .txt format for further analysis.
Reference:
KB 315263 – How to read the small memory dump files that Windows creates for debugging
How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888 whitepaper