Tools Tips and Tricks #8 – Debug Diagnostic and Crash Rules

Screenshot 1There are multiple instances where you might need to collect a dump for a process which generates an Access Violation or just setup a crash rule for an application that crashes and that doesn’t generate a dump to investigate further. So today I am going to imagedemonstrate a way to capture a crash dump of a process using the Debug Diagnostic Tool. The example shown here can be extended to SQL Server as well. Once you launch the Debug Diagnostic Tool, you will notice that the tool has 3 different tabs namely Rule, Advanced Analysis and Processes.

I am now going to walkthrough setting up of a Crash Rule for an EXE called DebugDiag_CrashRule.exe which I had written to generate an Access Violation and then no exception handler written in the code which would result in a termination of the EXE.

The steps that you can use to setup the Crash Rule are:

1. Use the Rule tab and click on the Add Rule button. Select the option “Crash” in the Select Rule Type window and click on Next. (see screenshot on the left)
2. Then select “A specific process” option in the Select Target Type window and click on Next. (see screenshot on the right)
3. Next you are presented with an option to select a process from the list of active processes. imageI gave the process name in the Selected Processes text-box and didn’t select the “This process instance only” option. If you select this option, then the monitoring will be setup for only the particular instance that you have selected. If left unchecked, then the monitoring will be setup for any instance of the process spawned on the machine. (see screenshot on the right). Click on Next after you have given the appropriate process name or selected the process ID.
4. In the next screen “Advanced Configuration”, I clicked on the Exceptions button which lets you configure actions for First Chance Exceptions for the process being monitored. Click on the Add Exception button on the First Chance Exception Configuration window. Select Access Violation for the exception name and for the Action Type, select Full User Dump. The Action Limit that I have set in this case is 6 which means that after 6 dumps captured by this rule, no further dumps will be generated. This is a good way of limiting the number of dumps that will be generated for the process that you are monitoring. Notice that that Action type for unconfigured first chance exceptions is set to None. This means that for any first chance exception that is generated apart from the one that you have configured, no action will be taken by the rule. (see screenshot below).

imageimageimage

5. Next you get the “Select Dump Location and Rule Name” window where you can changed the rule name and the location of the dumps generation. By default the dumps are generated at <Debug Diagnostic Install Path>\Logs\<Rule Name> folder. (see screenshot on the left)
6. In the last window, use the Activate option to enable the rule.

Now that the rule is active, when I launch my EXE, I can find that the Being Debugged column has the value “Yes” for DebugDiag_CrashRule.exe under the Processes tabs. I launched my application once and this generated 2 dumps. The Rules tab will show you the rule configured, status of the rule and the number of dumps captured by the rule and the location of the dump. (see screenshot above)

Now that the data is captured, the tool allows you to perform Crash Analysis using the Advanced Analysis tab. Using the Analyze Data button, you can analyze the dump files together and get a HTML report of the dumps captured. The Analysis of the dumps generated should be done on a non-production machine to prevent an overhead of downloading symbols and performing the analysis. The symbol search path can be provided using the Tools –> Options and Settings –> Folders and Search Path tab. The symbol search path would be: SRV*<local folder>*http://msdl.microsoft.com/download/symbols. A local folder is always good to have so that the symbols used are cached locally if required later. This prevents having to connect to the Microsoft Symbol Server everytime you analyze a dump.

This is what I see in the DebugDiag_CrashRule__PID__3576__Date__05_24_2011__Time_12_59_56AM__612__First Chance Access Violation.dmp file. It tells me that Thread 0 generated the dump. As you can see in the screenshot below, the report gives you the call stack as well. The Exception information shows that an Access Violation was raised while trying to write to a memory location by a sprintf function called by the DebugDiag_CrashRule EXE.

Exception Information

In DebugDiag_CrashRule__PID__6476__Date__05_24_2011__Time_12_08_12AM__341__First Chance Access Violation.dmp the assembly instruction at msvcr100!vcwprintf_s+2f07 in C:\Windows\System32\msvcr100.dll from Microsoft Corporation has caused an access violation exception (0xC0000005) when trying to write to memory location 0x3f9721d0 on thread 0

image

The HTML report has a similar output for the DebugDiag_CrashRule__PID__3576__Date__05_24_2011__Time_12_59_57AM__647__Second_Chance_Exception_C0000005.dmp file as well. An additional file DebugDiag_CrashRule__PID__3576__Date__05_24_2011__Time_12_59_56AM__612__ProcessList.txt is also generated which contains a list of processes running on the machine when the dump was generated. All the reports are saved in the <Debug Diagnostic install path>\Reports folder by default.

Hope this makes it easier to capture crash dumps for processes which are terminating without giving you much information in the Windows event logs or the logs generated by the application. This is a tool that is commonly used by CSS to configure crash rules on a machine and capture crash dumps.

Once you have collected the required data, you can go to the Rules tab, right click on the rule and click on De-activate rule. Once the rule status shows up as Not Active, you can choose to remove the rule by right-clicking on the rule and selecting the Remove Rule option.

The Debug Diagnostic Tool v1.1 available on the Microsoft Downloads site doesn’t have a x64 version which lets you configure rules and monitor processes for Windows Server 2008/R2/Windows 7/Windows Vista. The 64-bit version of the tool only has Advanced Analysis option. The Debug Diagnostic Tool v1.2 available from here has a 64-bit download (Beta version) which allows to configure rules and monitor processes for Windows Server 2008/R2/Windows 7/Windows Vista.

TIP: Let’s talk about a quick tip about taking dumps of a SQL Server process using Debug Diagnostic tool. In case you find that your SQL Server process is hung and you cannot log into the SQL instance using Dedicated Administrator Connection (DAC) or don’t have DAC enabled for the instance, then you can take a mini-dump using the Processes tab.

Steps
1. Click on the Processes tab.
2. Right-click on the process name sqlservr.exe. If you have multiple instances, you can identify the correct instance using the Active Services column which will give you the instance name. (see screenshot below)
3. Right-click on the process name, click on Create Full Userdump or Create Mini Userdump option to generate a full or mini memory dump of the selected process. The dump location will be default location configured for the dump under Tools –> Options and Settings –> Folders and Search Path tab –> Manual Userdump Save Folder. By default it is <Debug Diagnostic install path>\Logs\Misc.image

imageYou might also encounter a situation where you are required to take a dump of a process every n seconds/minutes. This can be done by right clicking on the process name and selecting the Create Userdump series… option. This will give you a window with the set of options as seen in the screenshot on the left. As you can see, it lets you configure a number of options for the process namely:

1. Time interval between each dump created
2. When to start the timer, before or after the dump file writing completes.
3. When to stop generating the dumps
4. Whether to collect Full or Mini dumps
5. Option to collect a single Full user dump when a Mini-dump series is created for a process before or after the series is completed.

Tess Ferrandez [Blog], an Escalation Engineer in CSS, also has a pretty detailed blog post on how to setup a crash rule for ASP.NET applications.

Note: For a 64-bit process (eg. 64-bit SQL Server instance) which has a large process address space, it would be advisable to configure a User Mini Dump as the Exception action instead of a full-user dump. When a dump file is generated, the process threads are frozen to take a snapshot. For a large dump generation on a system with a slow Disk IO, you might end up with SQL outages and on a cluster it can result in a failover during the dump generation period.

Tools Tips and Tricks #7: PsExec as parent and ProcMon as child

A new week begins and the Tools, Tips and Tricks series continues. Today I shall talk about using PsExec (part of the PsTools package) to collect data from a remote box through PsExec. PsExec lets you execute processes on other systems without having to manually install client software.

The scenario that I will discuss today is collection of a Process Monitor trace on a server without logging onto the machine. This is quite useful if you need to collect a Process Monitor trace from one of the passive nodes while running SQL Server 2005 setup.

The .cmd file that I am using in this case has the following commands:


set PM=D:\ProcMon\procmon.exe

start %PM% /quiet /minimized /AcceptEULA /backingfile D:\ProcMon\notepad.pml

%PM% /waitforidle

ping 127.0.0.1 -n 10 -w 100

%PM% /terminate

I am running the .cmd file to spawn Process Monitor on a different server and capture a trace using a local file as the backing file. The reason why we would have to run Process Monitor from command line is because while running setup, it’s recommended that you do not log into the passive nodes using Terminal Services or Console Mode.

The command that I used to execute the commands present in the .cmd file is

PsExec \\<server name> -c RemoteProcMon.cmd

If you do not prefer the ping method to simulate waits, you can use the SLEEP utility available with Windows 2003 Resource Kit. This utility takes a parameter for milli-seconds to wait. If you need to set filters for Process Monitor, then run the Process Monitor tool in GUI mode and set the Filters and save it. After that Exit Process Monitor and run the above command from Command Line. Or you could alternatively create a configuration file and use the /LoadConfig command line switch for Process Monitor.

Again the possibilities are endless and you can extend this by using the command line options for PsExec and Process Monitor.

Use the options mentioned in Tools Tips and Tricks #1: Process Monitor so that the filters are correctly set and the page file doesn’t get overwhelmed if you are capturing data for long periods or on servers where there is high amount of activity.

Tools Tips and Tricks #6: Custom Reports in SQL Nexus

Here is another post in the Tools Tips and Tricks series which tells you some feature about SQL Nexus which is not widely used. Yes, SQL Nexus is one of my favorite tools (provided it is used correctly). Automation of analysis helps get the picture quickly but the co-relation of data has to be drawn by the person performing the analysis.

Today I shall show you how you can use custom reports feature by creating simple RDL files in Visual Studio and getting SQL Nexus to use them.

As you can see in the screenshot below, I have a few reports which are not part of the standard installation of SQL Nexus available on the CodePlex site.

I had built this report ages ago when RML Utilities didn’t have the option of drilling through the Interesting Events and fetching out the sub-class reason for the Interesting Event.

image

The report structure itself is quite simple. It uses the Shared Data imageSource “sqlnexus.rds” so that the database context can switch when you change the database name using the drop-down menu in the SQL Nexus tool. Then I created an Action Event to jump to a child report which is called “Event Drilldown.rdlc”. This gives you a view as shown in the screenshot below. This is showing you a on which database the AutoStats event was fired and what the reason for AutoStats kicking in.

image

Once you have the custom reports built, you can drop them in the following folder: %appdata%\SQLNexus\Reports. SQL Nexus will automatically pickup these reports when it is launched.

The two files that are used in the above example can be downloaded from here. You can dissect the structures of the report to understand what queries were used to fetch the data for the reports shown above.

The ReadTrace_Main report for RML Utilities provides this drill-down function currently using the “Interesting Events” link on the main dashboard under “Additional Reports”.

Have a good weekend and stay tuned to this series for more tips and tricks next week!

Download linke for RDL files: Client Report Definition (.rdlc) Files

Tools Tips and Tricks #5: SQLDIAG and RANU

During the course of the Tools Tips and Tricks series, I had blogged about how to connect to a RANU instance and how to find out the SQL Express user instances active on the machine. Today, I shall discuss how to use SQLDIAG to collect data from a SQL Express user instance.

Since a SQL Express User Instance involves the dynamic spawning of a new instance and connections are allowed locally only using named pipes, normal instance discovery doesn’t work. If you need to collect diagnostic data using SQLDIAG for such an instance, you need to take the following steps:

1. Start the user instance by instantiating a connection to the user instance from the application that uses it.
2. Use any method mentioned in my previous post to get the instance GUID and the named pipe string.
3. Construct a named pipe alias using SQL Server Configuration Manager (see screenshot below). Use SQL Native Client Configuration 32-bit or 64-bit as appropriate for your version of SQL Express.

image

The Alias Name is in the form of <server name>\<user instance GUID>. The pipe name is what you obtained for the user instance that you are interested in monitoring. Leaver the Server name as blank.
4. Once the alias is created, test if you can connect to it using SQLCMD –S:.\BA78E627-AD14-4 –E locally from the machine that has SQL Express installed on it.
5. Now in the SQLDIAG.XML configuration file that you are using put the server name and instance name as follows:

<Machine name=".">
….
<Instance name="BA78E627-AD14-4" windowsauth="true" ssver="10.5" user="">

After this you can run SQLDIAG from the machine which has SQL Express installed on it as user instances don’t accept remote connections.

Check back tomorrow for another new trick for tools that we use!

Tools Tips and Tricks #4: RML Utilities

It is time again for your friendly neighborhood (or virtual neighborhood) SQL CSS person to provide you with the next set of tips for the various tools that he uses in his day-to-day work. I probably use this tool “RML Utilities” or the database created by this at least once a day and sometimes even more. Not kidding!!

This tool has become synonymous with Profiler Trace analysis collected from customer environments in CSS. Majority of you might be familiar with the tool “SQL Nexus” which call RML utilities in the background to load the collected Profiler Traces into a SQL Server database. This calls an executable, ReadTrace.exe. In this post, I will talk about a few conditions when you should be calling ReadTrace.exe instead of invoking it from SQL Nexus.

image

As you can see above that SQL Nexus provides you the option of breaking down the activity of each SPID into individual .trc files but they can only be directed to the %TEMP%\RML folder on your machine. The previous server that I had didn’t have sufficient disk space to hold over 100GB of data in the TEMP directory. Yes, we do analyze data which runs into a few hundred GBs at times!

Now you have two options, change the %TEMP% environment path on your machine to a different drive or use the ReadTrace.exe to generate the .trc files in the required path.

Another option that cannot be controlled from SQL Nexus is the ability to add SPID, Hostname and Application filters while importing the data.

Some of the defaults for ReadTrace are shown below which are required for any profiler trace import are:

-o Output files to be generated in the current directory
-S Connect to default SQL Server instance on the machine
-d Load profiler data into PerfAnalysis database
-E Use windows authentication while connection.

Here are a few scenarios where I use ReadTrace.exe directly….

Analyzing a deadlock scenario where I know the Session IDs involved

Sometimes the deadlock graph is not sufficient to figure out why a lock is still held by that session and why that session acquired the lock on the object in the first place. At that point, you need to track down the sequence of statements executed by the session. In this case, I will use the following command to import the data.

readtrace -S. -E -ddbRMLUtilDemo
-I"D:\RML\RMLUtil_demo.trc" -A"!SQLCMD" -s53 -s55 -MF -o"D:\RML\output" -f

The above command will import the trace files into a database dbRMLUtilDemo on the default instance on the machine with a SPID filter of 53 and 55 and exclude events generated with the application name SQLCMD. Additionally, individual .trc files will be generated containing the activity of SPIDs 53 and 55.

Generating .RML files to be used by OStress

Use the –o parameter to change the output path and use the –s, –A and –H filters to apply the necessary SPID, application name and host name filters respectively. An example to apply filters for SPIDs 51, 52, 53, exclude events generated by application name PSSDIAG or hostname MyLaptop.

readtrace -S. -E -ddbRMLUtilDemo
-I"D:\RML\RMLUtil_demo.trc" -A"!SQLCMD" –s51 –s52 -s53 -o"D:\RML\output" –H"!MyLaptop" –A"!PSSDIAG"

The command line window and the ReadTrace.log file available in the output directory will confirm if the filters were applied correctly or not. Below is a screen shot of the command line window showing that the filters were applied correctly:

image

Handling MARS sessions

ReadTrace has limited processing ability for handling sessions using MARS (Multiple Active Result Sets). You have to use –f and –T35 to import data from trace files which contains MARS sessions. ReadTrace uses the formula ((Session Id * 10 ) + BatchId) so that a MARS sessions appears as a separate session. Eg. If you see Session ID 1620, 1621 and so on in the ReadTrace/SQLNexus database but cannot find such a session in the DMV outputs captured during that time, then the activity corresponds to Session ID 162 which was using MARS.

Note: And remember to always use a Server Side trace whenever possible to capture a profiler trace output to a local disk.

The PDF help files for RML Utilities are located at <system drive>:\Program Files\Microsoft Corporation\RMLUtils\Help folder by default on a machine where RML Utilities is installed in case you wanted more information on the above.

RML Utilities Download
http://blogs.msdn.com/b/psssql/archive/2008/11/12/cumulative-update-1-to-the-rml-utilities-for-microsoft-sql-server-released.aspx

Keep following the tag “Tools tips and Tricks” for more posts on the various tools that we use. That’s it for today!