Handling Deadlocked Schedulers is a piece of cake now


I had written walkthroughs (Part 1 | Part 2) on how to troubleshoot a Deadlocked Schedulers condition for SQL Server instances. Deadlocked Schedulers is a condition where all your SQL Server worker threads are exhausted and no new work requests are being picked up by the SQL Server instance.

Starting from SQL Server 2012, the System Health extended events session tracks deadlocked schedulers condition using the extended event (scheduler_monitor_deadlock_ring_buffer_recorded). The session tracks other useful events which makes it easy to trace back the series of events which led to the deadlocked schedulers condition!

I will be using the Extended Events UI in SQL Server 2012 management studio to show how the target file of the System Health session can be used to track deadlocked schedulers condition experienced by your SQL Server instance.

Notice that the time reported for the event is around 5:03PM (screenshot 1).

image

Looking into the details of each of the events reported above, I find the following (Screenshot 2):

image

It is very clear to me that I have a condition on my SQL Server instance where no new work requests are being accepted. The SQL Server instance is hardly using any CPU and the server is also quite idle (see Screenshot 2 for CPU usage details). Since the memory utilization shows up as 100%, I can say with a degree of certainty that I don’t have a memory contention issue.

So what was the issue? Let’s see if we can dig in further. The story so far is that I know about the deadlocked schedulers condition experienced by the SQL Server instance and also know that the CPU usage during that time was low on the server. Furthermore, there was no memory related issues during that time. Looking into the events reported by the System Health Session, I find there are large number of waits reported (see Screenshot 3)

image

I see that I have a large number of waits recorded around that time for lock related waits and worker waits. SOS_WORKER wait type is reported when a work request is waiting for a worker thread to be assigned to it and LCK_M_X and LCK_M_U are your typical lock blocking related waits!!

So now I know that I had a blocking issues during that time. Expanding each of those wait_info events in the Extended Events GUI in Management Studio will give the details of the wait and the query which was waiting for the lock.

The sp_server_diagnostics components, specifically the Query_Processing component, provides additional insight into this scenario. The XML data available in the component result (use the Extended Events GUI grouping option on the Component column) can help you track down the blocking and blocked tasks with the blocked process report. The pending tasks will give you information about which sessions have pending tasks and if you were tracking down CPU intensive queries, then this can help as well (see Screenshot 4).

image

In summary, I now know that there was a worker thread exhaustion issue due to a large blocking chain (one of the most common reasons for deadlocked schedulers) on my SQL Server instance…. And I did not even have to write a single query. I was able to figure this out with the help of the Extended Events UI from Management Studio!

If you want help retrieving the raw XML data from the QUERY_PROCESSING component result, then the query below can be helpful:


/*
T-SQL script to fetch sp_server_diagnostics information from the System Health Sessions .XEL files stored in the SQL Server instance LOG folder and track down information for the QUERY_PROCESSING component resultset.

Note: This works only for SQL Server 2012 instances.

Author: Amit Banerjee
Contact details:
Blog: www.troubleshootingsql.com
Twitter: <a href="http://twitter.com/banerjeeamit">http://twitter.com/banerjeeamit</a>
Email: troubleshootingsql@outlook.com

DISCLAIMER:
This Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment. THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. We grant You a nonexclusive, royalty-free right to use and modify the Sample Code and to reproduce and distribute the object code form of the Sample Code, provided that You agree: (i) to not use Our name, logo, or trademarks to market Your software product in which the Sample Code is embedded; (ii) to include a valid copyright notice on Your software product in which the Sample Code is embedded; and (iii) to indemnify, hold harmless, and defend Us and Our suppliers from and against any claims or lawsuits, including attorneys’ fees, that arise or result from the use or distribution of the Sample Code.
*/

SET NOCOUNT ON

IF (SUBSTRING(CAST(SERVERPROPERTY ('ProductVersion') AS varchar(50)),1,CHARINDEX('.',CAST(SERVERPROPERTY ('ProductVersion') AS varchar(50)))-1) >= 11)
BEGIN

DECLARE @UTDDateDiff int
SET @UTDDateDiff = DATEDIFF(mi,GETUTCDATE(),GETDATE())

-- Fetch information about the XEL file location
DECLARE @filename varchar(8000) ;
SELECT @filename = CAST(target_data as XML).value('(/EventFileTarget/File/@name)[1]', 'varchar(8000)')
FROM sys.dm_xe_session_targets
WHERE target_name = 'event_file' and event_session_address = (select address from sys.dm_xe_sessions where name = 'system_health');

SET @filename = SUBSTRING(@filename,1,CHARINDEX('system_health',@filename,1)-1) + '*.xel';

-- Read the XEL files to get the System Health Session Data
SELECT object_name,CAST(event_data as XML) as XMLData
INTO #tbl_sp_server_diagnostics
FROM sys.fn_xe_file_target_read_file(@filename, null, null, null)
WHERE object_name = 'sp_server_diagnostics_component_result'

SELECT
DATEADD(mi,@UTDDateDiff,XMLData.value('(/event/@timestamp)[1]','datetime')) as EventTime,
XMLData.value('(/event/data/text)[1]','varchar(255)') as Component,
XMLData.value('(/event/data/text)[2]','varchar(255)') as [State]
FROM #tbl_sp_server_diagnostics
WHERE  XMLData.value('(/event/data/text)[2]','varchar(255)')  <> 'CLEAN'
and XMLData.value('(/event/data/text)[1]','varchar(255)') = 'QUERY_PROCESSING'
ORDER BY EventTime DESC

SELECT
DATEADD(mi,@UTDDateDiff,XMLData.value('(/event/@timestamp)[1]','datetime')) as [Event Time],
XMLData.value('(/event/data/text)[1]','varchar(255)') as Component,
XMLData.value('(/event/data/value/queryProcessing/@maxWorkers)[1]','bigint') as [Max Workers],
XMLData.value('(/event/data/value/queryProcessing/@workersCreated)[1]','bigint') as [Workers Created],
XMLData.value('(/event/data/value/queryProcessing/@workersIdle)[1]','bigint') as [Idle Workers],
XMLData.value('(/event/data/value/queryProcessing/@pendingTasks)[1]','bigint') as [Pending Tasks],
XMLData.value('(/event/data/value/queryProcessing/@hasUnresolvableDeadlockOccurred)[1]','int') as [Unresolvable Deadlock],
XMLData.value('(/event/data/value/queryProcessing/@hasDeadlockedSchedulersOccurred)[1]','int') as [Deadlocked Schedulers],
XMLData
FROM #tbl_sp_server_diagnostics
WHERE XMLData.value('(/event/data/text)[1]','varchar(255)') = 'QUERY_PROCESSING'
and XMLData.value('(/event/data/text)[2]','varchar(255)') <> 'CLEAN'
ORDER BY [Event Time] DESC

DROP TABLE #tbl_sp_server_diagnostics

END

SET NOCOUNT OFF 
Advertisements

5 thoughts on “Handling Deadlocked Schedulers is a piece of cake now

  1. Pingback: SQL Server 2012: Debugging Deadlocked Schedulers « TroubleshootingSQL

It is always good to hear from you! :)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s