I had blogged about debugging deadlocked schedulers earlier for SQL Server 2008 R2 and below releases. Since there have been some fundamental changes in SQL Server 2012, I thought it would be a good idea to post about debugging the same scenario on SQL Server 2012 instances.
Debugging Deadlocked Schedulers Part 1
Debugging Deadlocked Schedulers Part 2
The Errorlog shows the following for a deadlocked schedulers issue reported on the SQL Server instance:
2012-12-01 02:50:36.69 Server ***Stack Dump being sent to C:\<path>\LOG\SQLDump0002.txt
2012-12-01 02:50:36.69 Server * BEGIN STACK DUMP:
2012-12-01 02:50:36.69 Server * 12/01/12 02:50:36 spid 7824
2012-12-01 02:50:36.69 Server * Deadlocked Schedulers
2012-12-01 02:50:42.86 Server New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 300 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the “max worker threads” configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0%. System Idle: 88%.
The first thing that we would need to do is load the memory dump (mini-dump) using Windows Debugger and then align the symbols using the following commands:
.sympath srv*F:\debugging sql server\publicSymbols*http://msdl.microsoft.com/download/symbols/
Post that you can verify that the symbols for SQL Server have been loaded correctly.
0:006> lmvm sqlservr
start end module name
000007f7`a08d0000 000007f7`a090e000 sqlservr (pdb symbols) f:\debugging sql server\publicsymbols\sqlservr.pdb\FBC1DEBA96C44E7484B3B43824BD5C512\sqlservr.pdb
Loaded symbol image file: sqlservr.exe
Mapped memory image file: C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\sqlservr.exe
Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\sqlservr.exe
Image name: sqlservr.exe
File version: 2011.110.3000.0
Product version: 11.0.3000.0
ProductName: Microsoft SQL Server
Let us assume here that I do not know the reason why the deadlocked schedulers issue was reported. So I will use the ~* command to find out the number of threads in the memory dump file that I am inspecting. A screenshot of the output is shown below.
Then I use the !uniqstack command to get the number of unique stacks in the memory dump file. As you can see from the output below, that there are 655 threads out of which 615 are duplicate.
Total threads: 655
Duplicate callstacks: 615 (windbg thread #s follow):
2, 5, 7, 9, 11, 13, 15, 17, 24, 25, 30, 33, 34, 35, 36, 37, 38, 43, 44, 45, 46,
642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654
So now it’s time to inspect a few of the duplicate threads. Starting with Thread ID 643, I see the following callstack using the commands
If you look at the callstack mentioned above, you will notice that some of the module names are different. So the approach that you used to debug your SQL Server 2008 R2 and below release memory dumps would not work here. Looking into the other call stacks, I find a similar pattern. This tells me that most of the threads are waiting on locks (note the frames highlighted in red above).
Since I now know that this is a blocking issue, I will attempt to find the total number of callstacks which have the lock waits.
Command: !findstack sqlmin!LockOwner::Sleep 2
Note that the module name is different here! The output of the above command will be similar to the snippet shown below.
Thread 653, 1 frame(s) match
* 05 sqlmin!LockOwner::Sleep+0x495
Using the !findstack command, I found that the number of such threads are 585 which is a whopping 89% of the total threads!!
Since we are looking at a SQL Server 2012 issue, we have the System Health Session default session logs which run by default on any SQL Server 2012 instance. If you have the .xel files available in the LOG folder, then you can use these files to dig into the blocking chains which would have been captured in the archived .xel files. I have documented on how to do this in a previous blog post of mine.
Also check if the Deadlocked Schedulers matches any KB Article (update) already released. Remember that deadlocked schedulers is a symptom and the not the cause!! So make sure that if you find a KB Article which talks about a Deadlocked Schedulers issue being addressed in a later build, the symptoms match the symptoms that you are experiencing on your server! If in doubt, ask CSS!
If you are stuck and have no clue why the issue happened, then this would be a good time to engage Microsoft CSS Support and provide them with the SQL Errorlogs, the dump file, the archive .xel files in the LOG folder and the windows event logs in .txt format for further analysis.
KB 315263 – How to read the small memory dump files that Windows creates for debugging
How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888 whitepaper