System Health Session: Part 2


I had written an introductory post on monitoring the system health using the default extended events sessions that runs on a SQL Server 2008 instance and above. Now it is time for next part for this post. In the first post, I provided a set of queries which would be used for getting all the errors that were recorded by the System Health Extended Events session. Now I can create a set of reports using Business Intelligence Development Studio which can be used by the Custom Reports feature of Management Studio.

I put together a three-level drill down report to get a summary report for all the errors reported by the T-SQL queries in my previous post. The dashboard report which will basically serve as the landing page for what I am terming as the System Health Dashboard looks something like this: image

As you can see above, the report shows me the different events captured by the Extended Events session. The first level drill down provides a summary of all the different errors reported.

image

The last drill-down option is to go look into every occurrence of a specific error number. This report basically shows all the occurrences of a specific error number along with the query text (if captured) and specifics for the event recorded rather than the generic error description that you see in the above report.

image

The other category of events captured by the Health Session are wait information which fall under the following category:

  • Any sessions that have waited on latches (or other interesting resources) for > 15 seconds.
  • Any sessions that have waited on locks for > 30 seconds.
  • Any sessions that have waited for a long time for preemptive waits. The duration varies by wait type. A preemptive wait is where SQL Server is waiting for external API calls.

The query below will give you the query which experienced the wait along with the wait times which were recorded by the System Health session.

SELECT CAST(xet.target_data AS XML) AS XMLDATA

INTO #SystemHealthSessionData

FROM sys.dm_xe_session_targets xet

JOIN sys.dm_xe_sessions xe

ON (xe.address = xet.event_session_address)

WHERE xe.name = 'system_health'

;WITH CTE_HealthSession (EventXML) AS

(

SELECT C.query('.') EventXML

FROM #SystemHealthSessionData a

CROSS APPLY a.XMLDATA.nodes('/RingBufferTarget/event') as T(C)

WHERE C.query('.').value('(/event/@name)[1]', 'varchar(255)') in ('wait_info','wait_info_external')

)

SELECT

EventXML.value('(/event/@timestamp)[1]', 'datetime') as EventTime,

EventXML.value('(/event/data/text)[1]', 'varchar(50)') as WaitType,

EventXML.value('(/event/data/value)[3]', 'int') as Duration,

EventXML.value('(/event/data/value)[4]', 'int') as Max_Duration,

EventXML.value('(/event/data/value)[5]', 'int') as Total_Duration,

EventXML.value('(/event/action/value)[2]', 'varchar(10)') as Session_ID,

EventXML.value('(/event/action/value)[3]', 'varchar(max)') as sql_text

FROM CTE_HealthSession

ORDER BY EventTime DESC

DROP TABLE #SystemHealthSessionData

A sample output of the above query is shown above. The .sql file for the above query can be downloaded from here.

image

Note: Beware of false positives for PREEMPTIVE_OS_GETPROCADDRESS waits described here.

In the next post, I shall give a preview of the wait reports and provide another set of queries to track a category of events tracked by the System Health session.

Once I have completed this series, I shall upload all the series of reports to my SkyDrive so that they can be downloaded for your benefit.

Advertisements

Sleeping session ID: Where art thou?


This has been an age old confusion ever since the Dynamic Management Views changed the way SQL Performance monitoring was done.

A SPID/Session ID will show up as Sleeping/Awaiting Command when it has no active work request that it needs to execute on the Database Engine. Consider the following scenario:

1. You connect to a SQL instance using Management Studio

2. You execute: SELECT @@SPID

3. You then open another SSMS query window and try and lookup the Session ID returned by the output of Step #2 using the DMV: sys.dm_exec_requests.

If you didn’t know of this behavior, then you will be in for a surprise! This is because the Session ID associated with the query in Step #2 has no active work to perform which is why the DMV didn’t report a row for that particular session. However, when you look up this session id using sys.dm_exec_connections or sys.sysprocesses. The reason sys.sysprocesses reports this SPID is because it doesn’t differentiate between a session with/without an active work request. It will report all SPIDs currently connected to the database engine.

The above behavior is expected and by-design.

Bob Dorr has mentioned about this in his post on the CSS SQL Escalation blog and also talks about how this can affect concurrency if such a session has open transactions:

How It Works: What is a Sleeping / Awaiting Command Session

Powershell script to calculate folder size


Just published a PS Script to calculate the TOP 5 folders for a UNC share. The need for this script arose when I needed to find out which folders were consuming the largest amount of disk space on a File Server. I know there are graphical utilities like DiskMon, Space Monger etc. but this script provides me the flexibility of automation like identifying the folder owners or deleting based on some criteria etc.

Click here to download the script.