A screencast showing the CPU usage statistics of a SQL Server 2012 instance retrieved using Power Pivot. The visualization has been built using Power View in Excel 2013. The nuts and bolts of how the visualization was created is available in the following blog post: PowerView and System Health Session–CPU health
A few moons back I had blogged about the high frequency events in SQL Server. In this post, I shall talk about the common scenarios where profiler traces are collected for SQL Performance issues and what kind of events need to be captured in the server side traces that are configured for the data collection.
While configuring a profiler trace, always include all data columns for that event as that doesn’t have an overhead as much as the number of events being captured during the trace capture.
In this post, I shall cover two of the most common issues that CSS collects profiler data for addressing SQL Performance issues: Attentions/Timeouts and High CPU usage/Long Running queries.
Attentions or Timeouts
Attentions or Timeouts can occur due to client side application timeout setting (mostly this is the case) or due to database engine side issue (like failure to acquire memory, problem on the server side connectivity layer etc.). During this time, you will have a lot of queries starting but not completing successfully. If you are dealing with such a scenario, then you would need to capture the Starting events using a profiler trace to find out which statements didn’t complete and encountered the Attention event.
The events that need to be captured are:
All Database Events
Log File Auto Shrink
Data File Auto Shrink
Log File Auto Grow
Data File Auto Grow
All Errors and Warnings
Attention (Important for tracking the timeouts)
Missing Column Statistics
Missing Join Predicate
Exchange Spill Event
Blocked process report
User Error Message
Background Job Error
Database Suspect Data Page
CPU threshold exceeded
Lock:Deadlock Chain and Lock:Timeout (timeout > 0)
Performance: Auto Stats
Showplan All or Showplan XML (As execution plan will not be available for queries which didn’t complete)
Showplan Statistics Profile or Showplan XML Statistics Profile (For comparing estimated and actual query plan)
Security Audit: Audit Login, Audit Login Failed and Audit Logout
Stored Procedures events:
Transactions: SQLTransaction is required for tracing back the transaction if there are nested transactions). DTCTransaction is required only if DTC transactions are getting timed out.
Cursor events: CursorRecompile and CursorImplicitConversion (Required for checking if the requested cursor is being implicitly converted or if the cursor is being frequently recompiled).
The above events should be sufficient to track down any query timeouts and the common root causes for the same.
High CPU usage/Long Running queries
Another set of issues that are commonly dealt with while troubleshooting SQL Performance issues are Long Running Queries or Queries causing high CPU usage on the server. In such a case, the above set of events would be definitely be useful. If you are certain that the profiler trace can be captured for the entire duration of the query execution, then for highly transactional servers, you can omit the capture of the following events:
The above logic of configuring a profiler trace will also help you capture trace data when you are trying to track down queries which are causing excessive amount of IO on the SQL instance.
Keep in mind that the profiler data captured may not be sufficient for root cause analysis if you are not capturing SQL blocking information and performance monitor data in conjunction with the profiler traces.
In case you need additional events like deadlock events or lock events, then refer to my previous post and find out if they fall in the high frequency event category and when to capture the particular event.
TIP: As much as possible, avoid adding Text filters to your profiler trace configuration as it can lead to high CPU usage for highly active OLTP environments during filter evaluation.
Hope that the above information will help you reduce the excessive amount of data collection for troubleshooting SQL performance issues.
Performance degradation can happen due to multiple reasons. The main bottlenecks that would affect performance are:
1. Memory: Physical or Virtual – Your server could be experiencing a physical crunch which would point us to the Available Physical RAM or a crunch on the Virtual Address Space. This would involve finding out the queries which are consuming high memory during compilation or execution and narrow down on the single biggest contributor and work towards rectifying the problem with the query.
2. Disk IO: There could be an issue with SQL Server queries running slowly due to large amount of IO being done by them or a disk performance issue.
3. CPU: It could be because the queries executing on your server are consuming high amount of CPU time. In such a case, it be statistics that are out-of-date or data that has increased which introduces data skew.
4. Blocking: The queries executing on the server could be blocked and waiting for a particular resource and thus increasing the query duration.
How to perform post-mortem analysis for SQL performance problems?
If you are facing a performance degradation on the SQL instance after an upgrade, we would need to analyze the PSSDIAG collected to find out the top bottlenecks. There are two ways to do this:
1. Comparison Analysis: You identify the a query batch or stored procedure which is a problem and provide us a baseline (w.r.t. normal execution time for the query in question). We capture a “fast” and a “slow” run and we perform a comparison between the two runs and we tell you what is biggest bottleneck: Memory, Disk, CPU or Database/Environment Settings like database statistics, query SET options, difference in parameters etc. Based on what is identified as the culprit, you will be able to identify/suggest action items to help remove the culprit.
2. Bottleneck Analysis: In case you do not have a baseline, you can perform a bottleneck analysis and then stack rank them for you in case there is more than one bottleneck. Based on the analysis, you will be able to identify the biggest bottleneck and can implement action items to mitigate the single largest bottleneck.
If you need to perform live troubleshooting on the server, then you can make use of DMVs if you are on SQL Server 2005 or above or use Performance Dashboard (mentioned below).
RML Utilities Download:
More Info on how to use the RML Utilities:
If you need to analyze blocking data on the server, then you can use SQL Nexus for the same.
SQL Nexus Download Link:
Another option would be to use SQL Server 2005 Performance Dashboard in case you are testing your application on SQL Server 2005. This tool lets you view your server status without running PSSDIAG and provides reports to identify Long Running queries and also identify which queries are consuming the highest amount of resources (IO/CPU/Memory).
Performance Dashboard for SQL Server 2008
SQL Server 2005 Ring Buffers stored information related to CPU usage. This information is updated every 1 minute. So at any given point of time you can trace back the CPU usage for 4 hours 15 minutes. A sample output below shows the SQL instance CPU utilization against the other processes (active on the server) CPU utilization. This will definitely help us identify whether the SQL Server process is actually driving CPU.
|Event Time||SQL Process Utilization||System Idle||Other Process Utilization|
For SQL Server 2005:
declare @ts_now bigint select @ts_now = cpu_ticks / convert(float, cpu_ticks_in_ms) from sys.dm_os_sys_info select record_id, dateadd(ms, -1 * (@ts_now - [timestamp]), GetDate()) as EventTime, SQLProcessUtilization, SystemIdle, 100 - SystemIdle - SQLProcessUtilization as OtherProcessUtilization from ( select record.value('(./Record/@id)', 'int') as record_id, record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)', 'int') as SystemIdle, record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)', 'int') as SQLProcessUtilization, timestamp from ( select timestamp, convert(xml, record) as record from sys.dm_os_ring_buffers where ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' and record like '%<SystemHealth>%') as x ) as y order by record_id desc
For SQL Server 2008:
declare @ts_now bigint select @ts_now = ms_ticks from sys.dm_os_sys_info select record_id, dateadd (ms, (y.[timestamp] -@ts_now), GETDATE()) as EventTime, SQLProcessUtilization, SystemIdle, 100 - SystemIdle - SQLProcessUtilization as OtherProcessUtilization from ( select record.value('(./Record/@id)', 'int') as record_id, record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)', 'int') as SystemIdle, record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)', 'int') as SQLProcessUtilization, timestamp from ( select timestamp, convert(xml, record) as record from sys.dm_os_ring_buffers where ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' and record like '%<SystemHealth>%') as x ) as y order by record_id desc