System Health Session: Part 4

This is the last post for the System Health Session series. In this post, I shall show you how to extract deadlock related information from the deadlock graph captured by the System Health Session.

The deadlock graph captured typically has three distinct nodes:

victim-list – Deadlock victim’s process identifier
process-list – Information pertaining to all the processes involved in the deadlock
resource-list – Information about the resources involved in the deadlock

The query below will provide you with the time stamp when the deadlock was reported along with victim process identifier.

 -- Fetch the Health Session data into a temporary table

SELECT CAST(xet.target_data AS XML) AS XMLDATA
INTO #SystemHealthSessionData
FROM sys.dm_xe_session_targets xet
JOIN sys.dm_xe_sessions xe
ON (xe.address = xet.event_session_address)
WHERE xe.name = 'system_health'
-- Gets the Deadlock Event Time and Victim Process
SELECT C.query('.').value('(/event/@timestamp)[1]', 'datetime') as EventTime,
CAST(C.query('.').value('(/event/data/value)[1]', 'varchar(MAX)') AS XML).value('(<a>/deadlock/victim-list/victimProcess/@id)[1]','varchar(100)'</a>) VictimProcess
FROM #SystemHealthSessionData a
CROSS APPLY a.XMLDATA.nodes('/RingBufferTarget/event') as T(C)
WHERE C.query('.').value('(/event/@name)[1]', 'varchar(255)') = 'xml_deadlock_report'
-- Drop the temporary table
DROP TABLE #SystemHealthSessionData 

The next query (when provided with an event time from the above query output)
will provide you a parsed version of the process list in a tabular format which
can be easier to read when you have a large number of sessions involved in the
deadlock.

 -- Fetch the Health Session data into a temporary table

SELECT CAST(xet.target_data AS XML) AS XMLDATA
INTO #SystemHealthSessionData
FROM sys.dm_xe_session_targets xet
JOIN sys.dm_xe_sessions xe
ON (xe.address = xet.event_session_address)
WHERE xe.name = 'system_health'

-- Parses the process list for a specific deadlock once provided with an event time for the deadlock from the above output

;WITH CTE_HealthSession (EventXML) AS
(
SELECT CAST(C.query('.').value('(/event/data/value)[1]', 'varchar(MAX)') AS XML) EventXML
FROM #SystemHealthSessionData a
CROSS APPLY a.XMLDATA.nodes('/RingBufferTarget/event') as T(C)
WHERE C.query('.').value('(/event/@name)[1]', 'varchar(255)') = 'xml_deadlock_report'
AND C.query('.').value('(/event/@timestamp)[1]', 'datetime') = '2011-09-28 06:24:44.700' -- Replace with relevant timestamp
)
SELECT DeadlockProcesses.value('(@id)[1]','varchar(50)') as id
,DeadlockProcesses.value('(@taskpriority)[1]','bigint') as taskpriority
,DeadlockProcesses.value('(@logused)[1]','bigint') as logused
,DeadlockProcesses.value('(@waitresource)[1]','varchar(100)') as waitresource
,DeadlockProcesses.value('(@waittime)[1]','bigint') as waittime
,DeadlockProcesses.value('(@ownerId)[1]','bigint') as ownerId
,DeadlockProcesses.value('(@transactionname)[1]','varchar(50)') as transactionname
,DeadlockProcesses.value('(@lasttranstarted)[1]','varchar(50)') as lasttranstarted
,DeadlockProcesses.value('(@XDES)[1]','varchar(20)') as XDES
,DeadlockProcesses.value('(@lockMode)[1]','varchar(5)') as lockMode
,DeadlockProcesses.value('(@schedulerid)[1]','bigint') as schedulerid
,DeadlockProcesses.value('(@kpid)[1]','bigint') as kpid
,DeadlockProcesses.value('(@status)[1]','varchar(20)') as status
,DeadlockProcesses.value('(@spid)[1]','bigint') as spid
,DeadlockProcesses.value('(@sbid)[1]','bigint') as sbid
,DeadlockProcesses.value('(@ecid)[1]','bigint') as ecid
,DeadlockProcesses.value('(@priority)[1]','bigint') as priority
,DeadlockProcesses.value('(@trancount)[1]','bigint') as trancount
,DeadlockProcesses.value('(@lastbatchstarted)[1]','varchar(50)') as lastbatchstarted
,DeadlockProcesses.value('(@lastbatchcompleted)[1]','varchar(50)') as lastbatchcompleted
,DeadlockProcesses.value('(@clientapp)[1]','varchar(150)') as clientapp
,DeadlockProcesses.value('(@hostname)[1]','varchar(50)') as hostname
,DeadlockProcesses.value('(@hostpid)[1]','bigint') as hostpid
,DeadlockProcesses.value('(@loginname)[1]','varchar(150)') as loginname
,DeadlockProcesses.value('(@isolationlevel)[1]','varchar(150)') as isolationlevel
,DeadlockProcesses.value('(@xactid)[1]','bigint') as xactid
,DeadlockProcesses.value('(@currentdb)[1]','bigint') as currentdb
,DeadlockProcesses.value('(@lockTimeout)[1]','bigint') as lockTimeout
,DeadlockProcesses.value('(@clientoption1)[1]','bigint') as clientoption1
,DeadlockProcesses.value('(@clientoption2)[1]','bigint') as clientoption2
FROM (select EventXML as DeadlockEvent FROM CTE_HealthSession) T
CROSS APPLY DeadlockEvent.nodes('//deadlock/process-list/process') AS R(DeadlockProcesses)

-- Drop the temporary table
DROP TABLE #SystemHealthSessionData

The script file for the above queries can be downloaded from here.

A sample output of the above two queries is shown below:

The second dataset which shows the parsed process list from the deadlock graph is for the timestamp highlighted above.

As mentioned in one of my previous posts, the custom reports used in the previous posts can be downloaded from here (Filename: System_Health_Session_Custom_Reports.zip).

Modification April 20th, 2012: Just updated the .sql files and added the Deadlock Report to the SkyDrive location.

Note: To make use of the deadlock graph captured by the System Health Session, you need to have the required update applied to avoid the issue mentioned in KB978629. The issue is addressed in:
981355    Cumulative Update package 1 for SQL Server 2008 R2
http://support.microsoft.com/default.aspx?scid=kb;EN-US;981355
977443    Cumulative update package 6 for SQL Server 2008 Service Pack 1
http://support.microsoft.com/default.aspx?scid=kb;EN-US;977443

If you don’t have the updates installed, then Jonathan (Blog | Twitter) has shown in his article Retrieving Deadlock Graphs with SQL Server 2008 Extended Events how to workaround the issue. Michael Zilberstein’s (Blog) article, Parsing Extended Events xml_deadlock_report, has an updated/corrected version of the T-SQL to fetch the deadlock information.

References:

Bart Duncan’s Weblog:
Deadlock Troubleshooting, Part 1
Deadlock Troubleshooting, Part 2
Deadlock Troubleshooting, Part 3

25 thoughts on “System Health Session: Part 4

  1. Hi Amit,

    Thank you for another great post!

    One small issue I noticed is that some extra characters seem to have crept into your first query– they’re tags. Guessing maybe your blog editor “fixed” something for you there? (Hopefully the characters post literally in this comment and aren’t stripped out– they’re “a” tags used for links in HTML.)

    Best,
    Kendra

    Like

  2. Hi Amit,

    awesome reports. I cannot say it in other words. This is great stuff.
    One little thing:
    Starting from the system health session dashboard and going to xml_deadlock_report there´s an error due to a missing file named
    XEDashboard.rdl

    Regards
    Dirk

    Like

  3. Thanks Dirk. That is an oversight on my part. I forgot to upload the deadlock RDL file in the uploaded reports. Thanks for reporting the issue. I need to add the deadlock report. Will do that soon and send out a tweet about the same.

    Like

  4. Pingback: Awesomesauce SQL Server 2012: System Health Session « TroubleshootingSQL

  5. Pingback: System Health Session and Deadlocks « TroubleshootingSQL

  6. Pingback: Troubleshooting deadlocks « sqlindian – On Locks and Deadlocks

  7. Pingback: System Health Session and beyond « TroubleshootingSQL

  8. Pingback: System Health Session and beyond « TroubleshootingSQL

  9. Pingback: SQL Bangalore UG Meet « TroubleshootingSQL

  10. Amit, awesome posts. Can you add the XEDashboard.rdl report? From the main Dashboard (XEHealthSessionDashboard), if I click on security_error_ring_buffer_recorded, I get a message about the XEDashboard.rdl being missing. I just downloaded the ZIP with the reports off of skydrive. Thanks!

    Like

  11. Amit, apologies for the newbie questions. I uploaded it to my Reporting Services server, but I’m getting an error because I don’t have DataSource1. If I create the datasource, it looks like I have to tie the report to just one machine. Am I doing something wrong? Help greatly appreciated.

    Like

  12. (apologies is this is a dupe)
    Amit, apologies for the newbie questions. I uploaded it to my Reporting Services server, but I’m getting an error because I don’t have DataSource1. If I create the datasource, it looks like I have to tie the report to just one machine. Am I doing something wrong? Help greatly appreciated.

    Like

  13. That is an oversight on my part. You can remove the Datasource1. I was using that for my development environment. The report uses the ServerName datasource which is a dynamic datasource. Hope this clarifies your question.

    Like

  14. Ever noticed how much an extra XMLIndex cuts of subsequent queries processing time 🙂

    /* Fetch the Health Session data into a temporary table */
    if object_id(‘tempdb..#SystemHealthSessionData’) is null
    begin
    /* ALZDBA: add XMLIndex for faster consumption ! */
    Create table #SystemHealthSessionData
    ( PK int not null primary key clustered,
    XMLDATA xml not null
    );
    CREATE PRIMARY XML INDEX PXML_SystemHealthSessionData
    ON #SystemHealthSessionData (XMLDATA);

    insert INTO #SystemHealthSessionData

    SELECT cast( 1 as int ) as PK, CAST(xet.target_data AS XML) AS XMLDATA
    FROM sys.dm_xe_session_targets xet
    INNER JOIN sys.dm_xe_sessions xe
    ON ( xe.address = xet.event_session_address )
    WHERE xe.name = ‘system_health’;

    end ;

    Nice reports !

    Like

  15. Amit, have you thought about how in SQL 2012 we have two targets for the system_health, a file and the ring buffer. When you don’t define which target to use, it seems to magically get to the file target, but does it query the ring buffer also? I have a system that fills the ring buffer easily, so I would like to skip it if ultimately not useful. Thanks for the great post!

    Like

  16. Well, I was wrong, it seems to not get the file target information, it is just a pointer to the file, so I am off to find out how to consume the file_target or create a dedicated dead_lock ring_buffer session to monitor this server.

    Like

  17. I dont have access to skydrive and therefore can not download sql or report. is there another way I can get a least the report ?
    thank you

    Like

Comments are closed.