TroubleshootingSQL Bytes–CPU usage analysis with Excel 2013

A screencast showing the CPU usage statistics of a SQL Server 2012 instance retrieved using Power Pivot. The visualization has been built using Power View in Excel 2013. The nuts and bolts of how the visualization was created is available in the following blog post: PowerView and System Health Session–CPU health

TroubleshootingSQL–CPU usage analysis with Excel 2013
Advertisement

Profiler events for different scenarios

A few moons back I had blogged about the high frequency events in SQL Server. In this post, I shall talk about the common scenarios where profiler traces are collected for SQL Performance issues and what kind of events need to be captured in the server side traces that are configured for the data collection.

While configuring a profiler trace, always include all data columns for that event as that doesn’t have an overhead as much as the number of events being captured during the trace capture.

In this post, I shall cover two of the most common issues that CSS collects profiler data for addressing SQL Performance issues: Attentions/Timeouts and High CPU usage/Long Running queries.

Attentions or Timeouts

Attentions or Timeouts can occur due to client side application timeout setting (mostly this is the case) or due to database engine side issue (like failure to acquire memory, problem on the server side connectivity layer etc.). During this time, you will have a lot of queries starting but not completing successfully. If you are dealing with such a scenario, then you would need to capture the Starting events using a profiler trace to find out which statements didn’t complete and encountered the Attention event.

The events that need to be captured are:

All Database Events

Log File Auto Shrink
Data File Auto Shrink
Log File Auto Grow
Data File Auto Grow

All Errors and Warnings

Attention (Important for tracking the timeouts)
ErrorLog
EventLog
Exception
Hash Warning
Execution Warnings
Sort Warnings
Missing Column Statistics
Missing Join Predicate
Exchange Spill Event
Blocked process report
User Error Message
Background Job Error
Bitmap Warning
Database Suspect Data Page
CPU threshold exceeded

Lock:Deadlock Chain and Lock:Timeout (timeout > 0)

Performance: Auto Stats

Showplan All or Showplan XML (As execution plan will not be available for queries which didn’t complete)

Showplan Statistics Profile or Showplan XML Statistics Profile (For comparing estimated and actual query plan)

Security Audit: Audit Login, Audit Login Failed and Audit Logout

Stored Procedures events:

RPC:Starting
SP:Recompile
SP:Starting
SP:Completed
SP:StmtStarting
SP:StmtCompleted
RPC:Completed

Transactions: SQLTransaction is required for tracing back the transaction if there are nested transactions). DTCTransaction is required only if DTC transactions are getting timed out.

TSQL events:

SQL:BatchCompleted
SQL:BatchStarting
SQL:StmtStarting
SQL:StmtCompleted
SQL:StmtRecompile

Cursor events: CursorRecompile and CursorImplicitConversion (Required for checking if the requested cursor is being implicitly converted or if the cursor is being frequently recompiled).

The above events should be sufficient to track down any query timeouts and the common root causes for the same.

High CPU usage/Long Running queries

Another set of issues that are commonly dealt with while troubleshooting SQL Performance issues are Long Running Queries or Queries causing high CPU usage on the server. In such a case, the above set of events would be definitely be useful. If you are certain that the profiler trace can be captured for the entire duration of the query execution, then for highly transactional servers, you can omit the capture of the following events:

SQL: BatchStarting
SQL: StmtStarting
SP: StmtStarting
SP: Starting
RPC: Starting

The above logic of configuring a profiler trace will also help you capture trace data when you are trying to track down queries which are causing excessive amount of IO on the SQL instance.

Reference for capturing blocking information on a SQL instance: SQL 2000 Blocker Script output/SQL 2005 PerfStats

Keep in mind that the profiler data captured may not be sufficient for root cause analysis if you are not capturing SQL blocking information and performance monitor data in conjunction with the profiler traces.

In case you need additional events like deadlock events or lock events, then refer to my previous post and find out if they fall in the high frequency event category and when to capture the particular event.

TIP: As much as possible, avoid adding Text filters to your profiler trace configuration as it can lead to high CPU usage for highly active OLTP environments during filter evaluation.

Hope that the above information will help you reduce the excessive amount of data collection for troubleshooting SQL performance issues.

Approach to SQL Performance issues

Performance degradation can happen due to multiple reasons. The main bottlenecks that would affect performance are:

1. Memory: Physical or Virtual – Your server could be experiencing a physical crunch which would point us to the Available Physical RAM or a crunch on the Virtual Address Space. This would involve finding out the queries which are consuming high memory during compilation or execution and narrow down on the single biggest contributor and work towards rectifying the problem with the query.

2. Disk IO: There could be an issue with SQL Server queries running slowly due to large amount of IO being done by them or a disk performance issue.

3. CPU: It could be because the queries executing on your server are consuming high amount of CPU time. In such a case, it be statistics that are out-of-date or data that has increased which introduces data skew.

4. Blocking: The queries executing on the server could be blocked and waiting for a particular resource and thus increasing the query duration.

How to perform post-mortem analysis for SQL performance problems?

If you are facing a performance degradation on the SQL instance after an upgrade, we would need to analyze the PSSDIAG collected to find out the top bottlenecks. There are two ways to do this:

1. Comparison Analysis: You identify the a query batch or stored procedure which is a problem and provide us a baseline (w.r.t. normal execution time for the query in question). We capture a “fast” and a “slow” run and we perform a comparison between the two runs and we tell you what is biggest bottleneck: Memory, Disk, CPU or Database/Environment Settings like database statistics, query SET options, difference in parameters etc. Based on what is identified as the culprit, you will be able to identify/suggest action items to help remove the culprit.

2. Bottleneck Analysis: In case you do not have a baseline, you can perform a bottleneck analysis and then stack rank them for you in case there is more than one bottleneck. Based on the analysis, you will be able to identify the biggest bottleneck and can implement action items to mitigate the single largest bottleneck.

If you need to perform live troubleshooting on the server, then you can make use of DMVs if you are on SQL Server 2005 or above or use Performance Dashboard (mentioned below).

RML Utilities Download:

http://www.microsoft.com/downloads/details.aspx?FamilyId=7EDFA95A-A32F-440F-A3A8-5160C8DBE926&displaylang=en

More Info on how to use the RML Utilities:

http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx

http://sqlcat.com/technicalnotes/archive/2008/02/01/precision-performance-for-microsoft-sql-server-using-rml-utilities-9-0.aspx

If you need to analyze blocking data on the server, then you can use SQL Nexus for the same.

SQL Nexus Download Link:

http://www.codeplex.com/sqlnexus

Another option would be to use SQL Server 2005 Performance Dashboard in case you are testing your application on SQL Server 2005. This tool lets you view your server status without running PSSDIAG and provides reports to identify Long Running queries and also identify which queries are consuming the highest amount of resources (IO/CPU/Memory).

Performance Dashboard:

http://www.microsoft.com/downloads/details.aspx?FamilyId=1d3a4a0d-7e0c-4730-8204-e419218c1efc

Performance Dashboard for SQL Server 2008

http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/sql-server-performance-dashboard-reports-in-ssms-introduction-install-datediff-error-amp-modified-for-sql-2008.aspx

How to find out the CPU usage information for the SQL Server process using Ring Buffers

SQL Server 2005 Ring Buffers stored information related to CPU usage. This information is updated every 1 minute. So at any given point of time you can trace back the CPU usage for 4 hours 15 minutes. A sample output below shows the SQL instance CPU utilization against the other processes (active on the server) CPU utilization. This will definitely help us identify whether the SQL Server process is actually driving CPU. 

Sample Output:

Event Time SQL Process Utilization System Idle Other Process Utilization
2009-11-19 20:00:49.947  0 96 4
2009-11-19 19:59:49.933  0 92 8
2009-11-19 19:58:49.910  0 91 9
2009-11-19 19:57:49.893  0 92 8

For SQL Server 2005: 

declare @ts_now bigint 

select @ts_now = cpu_ticks / convert(float, cpu_ticks_in_ms) from sys.dm_os_sys_info 

select record_id, 

dateadd(ms, -1 * (@ts_now - [timestamp]), GetDate()) as EventTime, 

SQLProcessUtilization, 

SystemIdle, 

100 - SystemIdle - SQLProcessUtilization as OtherProcessUtilization 

from ( 

select 

record.value('(./Record/@id)[1]', 'int') as record_id, 

record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') as SystemIdle, 

record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') as SQLProcessUtilization, 

timestamp 

from ( 

select timestamp, convert(xml, record) as record 

from sys.dm_os_ring_buffers 

where ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' 

and record like '%<SystemHealth>%') as x 

) as y 

order by record_id desc

For SQL Server 2008: 

declare @ts_now bigint 

select @ts_now = ms_ticks from 

sys.dm_os_sys_info 

select record_id, dateadd (ms, (y.[timestamp] -@ts_now), GETDATE()) as EventTime,




SQLProcessUtilization, 

SystemIdle, 

100 - SystemIdle - SQLProcessUtilization as OtherProcessUtilization 

from ( 

select 

record.value('(./Record/@id)[1]', 'int') as record_id, 

record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') 

as SystemIdle, 

record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 

'int') as SQLProcessUtilization, 

timestamp 

from ( 

select timestamp, convert(xml, record) as record 

from sys.dm_os_ring_buffers 

where ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' 

and record like '%<SystemHealth>%') as x 

) as y 

order by record_id desc