T-SQL Tuesday #18: CTEs, XML and Process Monitor

T-SQL TuesdayThis month’s recurring SQL blog party a.k.a. T-SQL Tuesday started by Adam Machanic (blog | @AdamMachanic) is being hosted by Bob Pusateri (blog | @SQLBob). This month’s topic is about Common Table Expressions which was introduced for the first time in SQL Server 2005. CTEs can be thought of as a derived table which can be self-referenced within a query. With the advent of CTEs, a lot of T-SQL developers made full use of this feature. Personally, I have used CTEs for various data collection and parsing scripts that I use for daily for my day-to-day work. I am a big fan of CTEs and have introduced their usage in one of the blocking reports for SQL Nexus. As with any coding construct, the use of CTEs has it’s PROs and CONs but in this post, I shall not go down the path as there are a lot of depends on whether a particular scenario is fit for the use of CTE of a temporary table. That decision has to be made based on testing against the benchmarks that you have or response-time SLAs that have defined for your application.

Since my primary expertise lies with SQL Server, I examine every opportunity of data analysis with the intention of importing the data into a SQL Server database table and then writing an automation routine over it to help me save the repititive iterations when I am analyzing the same data or need to perform the same set of steps over and over again. Recently I have shifted over to Powershell due to affinity for scripting but that will be a story for some other blog post. CSS uses Process Monitor, a vastly popular tool from Sysinternals, to troubleshoot various problems that we encounter while working on SQL Server related issues. Some of our adventures with Process Monitor are mentioned below:

PRB: SQL Backups to a UNC path fail with OS Error 1 (Incorrect Function)
INF: Permissions required for SQL Server Service account to use SSL certificate

During my tenure here with CSS, I too have relied on the Process Monitor to troubleshoot some fairly complex issues. Sometimes the Process Monitor .pml files can be quite large if collected without adding appropriate filters and I am just interested in a particular process(es). By importing the .PML file into a database, I can not only query the imported data using T-SQL queries but I can also write quick automations to look for common stack traces that I know have known issues/problems associated with them. Yes with Process Monitor, you can use locally cached symbols to get the call stack of the function call as well. Note that saving a .PML into XML with resolved stack symbols is slow due to the time taken for symbol resolution. The .XML file that I will use as an example was saved using the “Extensible Markup Language (XML)” option with “Include stack traces (will increase file size)” option enabled. I captured a Process Monitor trace while trying to access a directory which was not present on my server.

I already have a stored procedure to import the saved .XML file into a database table, then parse that existing data and insert the parsed data into the two tables that I have created to stored the event and stack information.

How does the use of CTEs fit in here?

The .XML file has the following structure for the stack and the frames:

<ProcessIndex>2083</ProcessIndex>
<Time_of_Day>5:52:38.8471548 PM</Time_of_Day>
<Process_Name>Explorer.EXE</Process_Name>
<PID>6588</PID>
<Operation>NotifyChangeDirectory</Operation>
<Path>D:\Tools</Path>
<Result></Result>
<Detail>Filter: FILE_NOTIFY_CHANGE_FILE_NAME, FILE_NOTIFY_CHANGE_ATTRIBUTES, FILE_NOTIFY_CHANGE_LAST_WRITE</Detail>
<stack>
<frame>
<depth>0</depth>
<address>0xfffff880013ab027</address>
<path>C:\Windows\system32\drivers\fltmgr.sys</path>
<location>fltmgr.sys + 0x2027</location>
</frame>

<frame>
<depth>1</depth>
<address>0xfffff880013abbe9</address>
<path>C:\Windows\system32\drivers\fltmgr.sys</path>
<location>fltmgr.sys + 0x2be9</location>
</frame>
….
….
</stack>

Now each operation has a call-stack associated with it which I am interested in importing into my database. But with the existing schema above, I cannot get the information that I require . After exploring the use of multiple ways of parsing the data using temporary variables, temporary tables, recursive CTEs, XML indexes and blah.. blah.. , I found that that quickest way to do this was:

1. Get the event list imported into a temporary table
2. Using the event list generated above, create the call stacks from the <frame> nodes (see code snippet below)

The logic here is that the cteXML used below extracts all frames for a particular event using a counter variable and creates XML data for each of the frames which are parsed by referencing the cteXML in the INSERT INTO…SELECT query.

-- Extract events from the XML file and put them into a staging table

INSERT INTO #tblXMLstaging (event,fname)

SELECT C.query('.') as event,@fname

FROM dbo.tbl_XMLData

CROSS APPLY xmldata.nodes('/procmon/eventlist/event') as T(C)

WHERE fname = @fname

-- Fetch values from staging table, store parsed frames in an XML and parse them to insert data into procmon_stacks table

WITH ctexml as

(

SELECT TOP 15 eventid,C.query('.') as frames,fname

FROM #tblXMLstaging

CROSS APPLY event.nodes('/event/stack/frame') as T(C)

WHERE eventid = @counter

)

INSERT INTO dbo.tblProcMon_Stacks (StackID, FName, FrameDepth, [Address], [Path], Location)

SELECT

eventid as StackID,

fname,

frames.value ('(/frame/depth)[1]', 'int') as FrameDepth,

frames.value ('(/frame/address)[1]', 'varchar(50)') as [address],

frames.value ('(/frame/path)[1]', 'varchar(255)') as [Path],

frames.value ('(/frame/location)[1]', 'varchar(255)') as [location]

FROM ctexml

The output of

If you are curious as to how long it takes to import the data, it took me a little over a minute (84 seconds to be precise) to load a 100MB XML file, parse and shred it! I am sure that there are optimizations possible to this method. The screenshot on the left shows the load times. The system specifications on where the data import is being done is: 

Operating System: Microsoft Windows Server 2008 R2 Enterprise (x64)

Machine: Hewlett-Packard HP Z800 Workstation

Processor: 2 Quad Core [Intel(R) Xeon(R) CPU E5506  @ 2.13GHz, 2128 Mhz, 4 Core(s), 4 Logical Processor(s)]

RAM: 16.0 GB

Hard Disk: Barracuda 7200.12 SATA 3Gb/s 500GB Hard Drive

SQL Server: SQL Server 2008 R2
Now that the data is available in a database table, I can query the information using T-SQL queries. I had attempted to open the directory C:\Foo on my server using Windows Explorer. This information and the callstack can be visible using the following T-SQL queries:

-- Get the event information associated with the directory that I was looking up
select ProcessName, PID, Operation, Path, Result, Detail, StackID from dbo.tblprocmon_events

where fname = 'FileNotFound.xml'

and path like '%C:\Foo%'

order by StackID

-- Get information for the call-stack using the StackID obtained above
select FrameDepth, address, Path, Location

from tblProcmon_stacks

where fname = 'FileNotFound.xml' and StackID = 662

order by FrameDepth

Below is a screenshot showing me that the directory doesn’t exist and the call-stack associated with the function call that failed.

The options are now limitless, you can take this forward by creation for common callstacks in-case you are troubleshooting an application developed by you and have access to the private symbols. Or you could look for common results for certain operations like Operation = QueryDirectory and Result = NO SUCH FILE. My use of CTEs are for the following tasks:

1. Parsing XML data as shown here
2. Extracting useful information from Ring Buffers
3. Tracking available contiguous memory used and available in the SQL Server process address space

The scripts containing the Stored Procedure definitions and table schemas can be downloaded from here.

Advertisement