SQL Server 2012: Debugging Deadlocked Schedulers

I had blogged about debugging deadlocked schedulers earlier for SQL Server 2008 R2 and below releases. Since there have been some fundamental changes in SQL Server 2012, I thought it would be a good idea to post about debugging the same scenario on SQL Server 2012 instances.

Older posts:
Debugging Deadlocked Schedulers Part 1
Debugging Deadlocked Schedulers Part 2

The Errorlog shows the following for a deadlocked schedulers issue reported on the SQL Server instance:

2012-12-01 02:50:36.69 Server      ***Stack Dump being sent to C:\<path>\LOG\SQLDump0002.txt
2012-12-01 02:50:36.69 Server      * BEGIN STACK DUMP:
2012-12-01 02:50:36.69 Server      *   12/01/12 02:50:36 spid 7824
2012-12-01 02:50:36.69 Server      * Deadlocked Schedulers
….
….
2012-12-01 02:50:42.86 Server      New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 300 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the “max worker threads” configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0%. System Idle: 88%.

The first thing that we would need to do is load the memory dump (mini-dump) using Windows Debugger and then align the symbols using the following commands:

.sympath srv*F:\debugging sql server\publicSymbols*http://msdl.microsoft.com/download/symbols/
.reload /f

Post that you can verify that the symbols for SQL Server have been loaded correctly.

0:006> lmvm sqlservr
start             end                 module name
000007f7`a08d0000 000007f7`a090e000   sqlservr   (pdb symbols)          f:\debugging sql server\publicsymbols\sqlservr.pdb\FBC1DEBA96C44E7484B3B43824BD5C512\sqlservr.pdb
    Loaded symbol image file: sqlservr.exe
    Mapped memory image file: C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\sqlservr.exe
    Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\sqlservr.exe
    Image name: sqlservr.exe
    File version:     2011.110.3000.0
    Product version:  11.0.3000.0
    ProductName:      Microsoft SQL Server
    InternalName:     SQLSERVR

Let us assume here that I do not know the reason why the deadlocked schedulers issue was reported. So I will use the ~* command to find out the number of threads in the memory dump file that I am inspecting. A screenshot of the output is shown below.

image

Then I use the !uniqstack command to get the number of unique stacks in the memory dump file. As you can see from the output below, that there are 655 threads out of which 615 are duplicate.

Total threads: 655
Duplicate callstacks: 615 (windbg thread #s follow):
2, 5, 7, 9, 11, 13, 15, 17, 24, 25, 30, 33, 34, 35, 36, 37, 38, 43, 44, 45, 46,


642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654

So now it’s time to inspect a few of the duplicate threads. Starting with Thread ID 643, I see the following callstack using the commands

~643s
kL100

ntdll!NtSignalAndWaitForSingleObject+0xa
KERNELBASE!SignalObjectAndWait+0x82
sqldk!SOS_Scheduler::Switch+0xb5
sqldk!SOS_Scheduler::SuspendNonPreemptive+0xd2
sqlmin!EventInternal<SuspendQueueSLock>::Wait+0x1ee
sqlmin!LockOwner::Sleep+0x495
sqlmin!lck_lockInternal+0xe9b

sqlmin!MDL::LockObjectLocal+0x446
sqlmin!SMD::LockObjectAndCheckVersion+0x86
sqllang!CRangeObject::XretSchemaChanged+0x3f0
sqllang!CRangeTable::XretSchemaChanged+0x1f
sqllang!CEnvCollection::XretSchemaChanged+0xe1
sqllang!CXStmtQuery::XretSchemaChanged+0x11c
sqllang!CXStmtSelect::XretExecute+0x1be
sqllang!CMsqlExecContext::ExecuteStmts<1,1>+0x400
sqllang!CMsqlExecContext::FExecute+0xa33
sqllang!CSQLSource::Execute+0x866
sqllang!process_request+0x73c
sqllang!process_commands+0x51c
sqldk!SOS_Task::Param::Execute+0x21e
sqldk!SOS_Scheduler::RunTask+0xa8
sqldk!SOS_Scheduler::ProcessTasks+0x29a
sqldk!SchedulerManager::WorkerEntryPoint+0x261
sqldk!SystemThread::RunWorker+0x8f
sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8
sqldk!SchedulerManager::ThreadEntryPoint+0x236
kernel32!BaseThreadInitThunk+0x1a
ntdll!RtlUserThreadStart+0x1d

If you look at the callstack mentioned above, you will notice that some of the module names are different. So the approach that you used to debug your SQL Server 2008 R2 and below release memory dumps would not work here. Looking into the other call stacks, I find a similar pattern. This tells me that most of the threads are waiting on locks (note the frames highlighted in red above).

Since I now know that this is a blocking issue, I will attempt to find the total number of callstacks which have the lock waits.

Command: !findstack sqlmin!LockOwner::Sleep 2

Note that the module name is different here! The output of the above command will be similar to the snippet shown below.

Thread 653, 1 frame(s) match
          00 ntdll!NtSignalAndWaitForSingleObject+0xa
          01 KERNELBASE!SignalObjectAndWait+0x82
          02 sqldk!SOS_Scheduler::Switch+0xb5
          03 sqldk!SOS_Scheduler::SuspendNonPreemptive+0xd2
          04 sqlmin!EventInternal<SuspendQueueSLock>::Wait+0x1ee
        * 05 sqlmin!LockOwner::Sleep+0x495
          06 sqlmin!lck_lockInternal+0xe9b
          07 sqlmin!MDL::LockObjectLocal+0x446
          08 sqlmin!SMD::LockObjectAndCheckVersion+0x86
          09 sqllang!CRangeObject::XretSchemaChanged+0x3f0
          10 sqllang!CRangeTable::XretSchemaChanged+0x1f
          11 sqllang!CEnvCollection::XretSchemaChanged+0xe1
          12 sqllang!CXStmtQuery::XretSchemaChanged+0x11c
          13 sqllang!CXStmtSelect::XretExecute+0x1be
          14 sqllang!CMsqlExecContext::ExecuteStmts<1,1>+0x400
          15 sqllang!CMsqlExecContext::FExecute+0xa33
          16 sqllang!CSQLSource::Execute+0x866
          17 sqllang!process_request+0x73c
          18 sqllang!process_commands+0x51c
          19 sqldk!SOS_Task::Param::Execute+0x21e
          20 sqldk!SOS_Scheduler::RunTask+0xa8
          21 sqldk!SOS_Scheduler::ProcessTasks+0x29a
          22 sqldk!SchedulerManager::WorkerEntryPoint+0x261
          23 sqldk!SystemThread::RunWorker+0x8f
          24 sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8
          25 sqldk!SchedulerManager::ThreadEntryPoint+0x236
          26 kernel32!BaseThreadInitThunk+0x1a
          27 ntdll!RtlUserThreadStart+0x1d

Using the !findstack command, I found that the number of such threads are 585 which is a whopping 89% of the total threads!!

Since we are looking at a SQL Server 2012 issue, we have the System Health Session default session logs which run by default on any SQL Server 2012 instance. If you have the .xel files available in the LOG folder, then you can use these files to dig into the blocking chains which would have been captured in the archived .xel files. I have documented on how to do this in a previous blog post of mine.

Also check if the Deadlocked Schedulers matches any KB Article (update) already released. Remember that deadlocked schedulers is a symptom and the not the cause!! So make sure that if you find a KB Article which talks about a Deadlocked Schedulers issue being addressed in a later build, the symptoms match the symptoms that you are experiencing on your server! If in doubt, ask CSS!

If you are stuck and have no clue why the issue happened, then this would be a good time to engage Microsoft CSS Support and provide them with the SQL Errorlogs, the dump file, the archive .xel files in the LOG folder and the windows event logs in .txt format for further analysis.

Reference:
KB 315263 – How to read the small memory dump files that Windows creates for debugging
How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888 whitepaper