System Health Session: Part 3

In my last post, I had demonstrated how to fetch the waits information captured by the System Health session. In this post, I shall show the custom reports that can be built using the waits information fetched.

image

The above report gives a summary of all the waits recorded by the health session. I have a second level drill-down available which allows me to get specific information for each of the distinct waits reported above and lets me get the SQL Query that experienced the wait.

image

As you can see above, that I have the SQL query, the time when the wait was reported along with duration statistics of the wait.

Another category of events that are tracked by the System Health session are non-yielding scheduler conditions (reported using the error number 17883) in SQL Server. So if your SQL Server instance encountered a non-yielding scheduler condition, then this information can be tracked using System Health session. The query below will give you the information of all the non-yielding conditions detected by the SQL Server instance subject to condition that the ring buffer storing this information has not been overwritten.

-- Query to fetch non-yielding errors captured by the System Health Session
SET NOCOUNT ON

SELECT CAST(xet.target_data AS XML) AS XMLDATA
INTO #SystemHealthSessionData
FROM sys.dm_xe_session_targets xet
JOIN sys.dm_xe_sessions xe ON (xe.address = xet.event_session_address)
WHERE xe.name = 'system_health'

;WITH CTE_HealthSession (EventXML) AS
( SELECT C.query('.') EventXML
FROM #SystemHealthSessionData a
CROSS APPLY a.XMLDATA.nodes('/RingBufferTarget/event') as T(C)
WHERE C.query('.').value('(/event/@name)[1]', 'varchar(255)') = 'scheduler_monitor_non_yielding_ring_buffer_recorded' )
SELECT EventXML.value('(/event/@timestamp)[1]', 'datetime') as EventTime,
EventXML.value('(/event/data/value)[4]', 'int') as NodeID,
EventXML.value('(/event/data/value)[5]', 'int') as SchedulerID,
CASE EventXML.value('(/event/data/value)[3]', 'int') WHEN 0 THEN 'BEGIN' WHEN 1 THEN 'END' ELSE '' END AS DetectionStage,
EventXML.value('(/event/data/value)[6]', 'varchar(50)') as Worker,
EventXML.value('(/event/data/value)[7]', 'bigint') as Yields,
EventXML.value('(/event/data/value)[8]', 'int') as Worker_Utilization,
EventXML.value('(/event/data/value)[9]', 'int') as Process_Utilization,
EventXML.value('(/event/data/value)[10]', 'int') as System_Idle,
EventXML.value('(/event/data/value)[11]', 'bigint') as User_Mode_Time,
EventXML.value('(/event/data/value)[12]', 'bigint') as Kernel_Mode_Time,
EventXML.value('(/event/data/value)[13]', 'bigint') as Page_Faults,
EventXML.value('(/event/data/value)[14]', 'float') as Working_Set_Delta,
EventXML.value('(/event/data/value)[15]', 'bigint') as Memory_Utilization
FROM CTE_HealthSession
ORDER BY EventTime,Worker

DROP TABLE #SystemHealthSessionData

The .sql file for the above script is available here.

image

A sample output of the above query is shown on the left. An important information in this output is the worker address. Using this worker address, I can get the relevant messages pertaining to the non-yielding condition.

2011-09-27 21:57:51.560 Server       Process 0:0:0 (0x18c0) Worker 0x000000000606A1A0 appears to be non-yielding on Scheduler 5. Thread creation time: 12961597452926. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 0%. System Idle 98%. Interval: 70077 ms.
2011-09-27 21:58:51.660 Server       Process 0:0:0 (0x18c0) Worker 0x000000000606A1A0 appears to be non-yielding on Scheduler 5. Thread creation time: 12961597452926. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 0%. System Idle 97%. Interval: 133017 ms.
2011-09-27 21:59:51.760 Server       Process 0:0:0 (0x18c0) Worker 0x000000000606A1A0 appears to be non-yielding on Scheduler 5. Thread creation time: 12961597452926. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 2%. System Idle 94%. Interval: 193116 ms.
2011-09-27 22:00:51.860 Server       Process 0:0:0 (0x18c0) Worker 0x000000000606A1A0 appears to be non-yielding on Scheduler 5. Thread creation time: 12961597452926. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 3%. System Idle 93%. Interval: 253215 ms.

If you look at the Errorlog snippet above, you will notice that there are entries for the highlighted worker address are same as the one reported in the management studio output screen shot above. The time reported in the Errorlog is the server time (in my case it is UTC+5:30) where as the time stamps reported by the health session is UTC time which means that the timestamps reported also match. Using the events from the System Health Session and the Errorlog entries, I will be able to figure out when a non-yielding condition occurred on the server instance.

The non-yielding condition report is quite plain and has a table output showing the different columns returned by the query above:

image

In the last post for this series, I shall provide the set of queries that can be used to parse a deadlock graph collected by the System Health session along with the .rdl files for the custom reports used in this series.
Advertisement