XML Plans Saga –Twilight

XML plans were a new addition in SQL Server 2005 and above which enabled a DBA to fetch additional information from a SQL Server query plan. In this blog post, I shall list out the useful information that can be fetched out of a query plan which can be pertinent to your performance tuning activities. I am old-skool and hence I still prefer analyzing query plans/comparing them using the text plans. However, that doesn’t mean that I ignore XML plans as they have a plethora of information which can be useful for troubleshooting a SQL Server performance problem as well as for query tuning activities.

As you have already figured out from the post title that this is the first post in the XML Plans Saga. All the information that I need to dole out as part of this series will be covered in four blog posts with a break down as shown in the bulleted list below.

You can generate XML Showplan output by using the following methods:

  • Selecting Display Estimated Execution Plan or Include Actual Execution Plan from the query editor toolbar in SQL Server Management Studio
  • Using the Transact-SQL Showplan SET statement options SHOWPLAN_XML (for estimated plan) and STATISTICS XML (actual execution plan)
  • Selecting the SQL Server Profiler event classes Showplan XML (for estimated plan), Showplan XML for Query Compile, and Showplan XML Statistics Profile (actual execution plan) for tracing
  • Using the sys.dm_exec_query_plan dynamic management function

    XML Showplans are returned in the nvarchar(max) data type for all of these methods, except when you use sys.dm_exec_query_plan. XML Showplans are returned in the xml data type when you use this dynamic management view.

    The XML schema for Showplan is available with the SQL Server installation files at the following location:

    SQL Server 2008 and 2008 R2: C:\Program Files\Microsoft SQL Server\100\Tools\Binn\schemas\sqlserver\20047\showplan\showplanxml.xsd
    SQL Server 2005: C:\Program Files\Microsoft SQL Server\100\Tools\Binn\schemas\sqlserver\20047\showplan\showplanxml.xsd

    For x64 platforms, the folder would be Program Files (x86) instead of Program Files.

    I am going to pick a simple query to demonstrate the use of XML plans and gather information from the graphic XML plans which are obtained from management studio. The query that I will be using is:

    select top 100 chardata
    from dbo.tblTechEd1
    where id = 1

    Now the above table doesn’t have a clustered index. The table size is approximately 781MB with 100K rows.

  • Continue reading

    A shot in the arm for sp_purge_data

    I had recently worked on a performance issue with the Management Data Warehouse job “mdw_purge_data”. This job calls a stored procedure sp_purge_data which was taking a long time to complete. This stored procedure just got a shot in the arm with an update to it released by the SQL Server Developer working on it. The T-SQL for the modified stored procedure is available on the SQL Agent MSDN blog. The actual Stored Procedure will be available in an upcoming update for SQL Server 2008 R2.

    Addition: July 20th, 2011: SQL Server 2008 R2 Service Pack 1 has various updates for the purge process which doesn’t require modification of the procedure using the script mentioned in the aforementioned blog post.

    A recent discussion on Twitter reminded me that I had this as one of the items to blog about so here it is. Now for some of the interesting technical details.

    The poor performance of the stored procedure has been addressed through a refactoring of the Stored Procedure code but the issue doesn’t occur on all Management Database Warehouse databases.

    The issue occurs when you have large amounts of data stored in the [snapshots].[query_stats] for each unique snapshot ID. If you have orphaned rows in the ‘ [snapshots].[notable_query_plan] ‘ and ‘ [snapshots].[notable_query_text] ‘ tables of the Management Data Warehouse. This was corrected in the builds mentioned in the KB Article below:

    970014    FIX: The Management Data Warehouse database grows very large after you enable the Data Collector feature in SQL Server 2008
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;970014

    The above KB Article also has a workaround mentioned which allows you to clean-up the orphaned rows using a T-SQL script.

    The “Query Statistics” data collection set collects query statistics, T-SQL text, and query plans of most of the statements that affect performance. Enables analysis of poor performing queries in relation to overall SQL Server Database Engine activity. It makes use of DMVs sys.dm_exec_query_stats to populate the entries in the snapshots.query_stats table. If this DMV returns a large number of rows for each time the collection set runs, the number of rows in the actual Management Data Warehouse table “snapshots.query_text” can grow very quickly. This will then cause performance issues when the actual purge job executes on the MDW database.

    In summary, if you are using Management Data Warehouse to monitor a highly active server where there are a lot of ad-hoc queries or prepared SQL queries being executed, then I would recommend you to modify you the sp_purge_data stored procedure using the MSDN blog link, I mentioned earlier in my post. I have seen significant performance improvement due to this on the specific environment that I was working on.

    T-SQL Tuesday#17: It’s all about APPLYcation this time

    imageIt’s time for another round of T-SQL Tuesday and this round of the revolving blog party is being hosted by Matt Velic [Blog | Twitter].

    APPLY – That is the topic for this month’s T-SQL Tuesday! The APPLY operator was added to the T-SQL repertoire and which has resulted in lesser use of cursors for a large number of diagnostic scripts that CSS uses to collect data while working on SQL Performance issues. In this blog, I shall share a few examples of such queries that we use to collect data while working on SQL Performance cases.

    TOP Query Plan Statistics

    The following query gives you a list of the SQL batches/procedures with their CPU usage, Query/Batch duration and Physical Reads rank. This query helps identify the TOP CPU/Duration/Read consuming queries by making use of system DMVs. The output below is useful for the following reasons:

    1. I get the usecount of the procedure/batch and if this batch is called multiple times and the use count of a Compiled Proc cached object is only 1, then the plan is not being re-used. This now tells me that I need to look at reasons behind inability of plan re-use.

    2. I get the total and average resource usage statistics for each of the queries listed in the output.

    3. A quick glance at the output gives me an idea of the most expensive queries on the instance w.r.t. reads or/and CPU and/or query duration.

    
    SELECT
    
    LEFT(p.cacheobjtype + ' (' + p.objtype + ')',35) AS cacheobjtype,
    
    p.usecounts,
    
    p.size_in_bytes/1024  AS size_in_kb,
    
    PlanStats.total_worker_time/1000 AS tot_cpu_ms,
    
    PlanStats.total_elapsed_time/1000 AS tot_duration_ms,
    
    PlanStats.total_physical_reads,
    
    PlanStats.total_logical_writes,
    
    PlanStats.total_logical_reads,
    
    PlanStats.CpuRank,
    
    PlanStats.PhysicalReadsRank,
    
    PlanStats.DurationRank,
    
    LEFT(CASE WHEN pa.value = 32767 THEN 'ResourceDb' ELSE ISNULL(DB_NAME(CONVERT(sysname,pa.value)),CONVERT(sysname,pa.value)) END,40) AS dbname,
    
    sql.objectid,
    
    CONVERT(nvarchar(50), CASE WHEN sql.objectid IS NULL THEN NULL ELSE REPLACE(REPLACE(sql.[text],CHAR(13),' '),CHAR(10),' ') END) AS procname,  REPLACE(REPLACE(SUBSTRING(sql.[text],PlanStats.statement_start_offset/2+1,CASE WHEN PlanStats.statement_end_offset=-1 THEN LEN(CONVERT(nvarchar(max),sql.[text]))
    
    ELSE PlanStats.statement_end_offset/2 - PlanStats.statement_start_offset/2+1 END),CHAR(13),' '),CHAR(10),' ') AS stmt_text
    
    FROM
    
    (
    
    SELECT
    
    stat.plan_handle,
    
    statement_start_offset,
    
    statement_end_offset,
    
    stat.total_worker_time,
    
    stat.total_elapsed_time,
    
    stat.total_physical_reads,
    
    stat.total_logical_writes,
    
    stat.total_logical_reads,
    
    ROW_NUMBER()OVER ( ORDER BY stat.total_worker_time DESC ) AS CpuRank,
    
    ROW_NUMBER()OVER ( ORDER BY stat.total_physical_reads DESC ) AS PhysicalReadsRank,
    
    ROW_NUMBER()OVER ( ORDER BY stat.total_elapsed_time DESC ) AS DurationRank
    
    FROM sys.dm_exec_query_stats stat
    
    ) AS PlanStats
    
    INNER JOIN sys.dm_exec_cached_plans p
    
    ON p.plan_handle =  PlanStats.plan_handle
    
    OUTER APPLY sys.dm_exec_plan_attributes ( p.plan_handle ) pa
    
    OUTER APPLY sys.dm_exec_sql_text ( p.plan_handle ) AS sql
    
    WHERE
    
    (PlanStats.CpuRank<50
    
    OR PlanStats.PhysicalReadsRank<50
    
    OR PlanStats.DurationRank<50)
    
    AND
    
    pa.attribute='dbid'
    
    ORDER BY tot_cpu_ms DESC
    
    
    

    Top Queries with Similar Query Hash and Query Plan Hash

     

    SELECT TOP 10 query_plan_hash, query_hash,
    
    COUNT (distinct query_plan_hash) as 'distinct query_plan_hash count',
    
    SUM(execution_count) as 'execution_count',
    
    SUM(total_worker_time) as 'total_worker_time',
    
    SUM(total_elapsed_time) as 'total_elapsed_time',
    
    SUM (total_logical_reads) as 'total_logical_reads',
    
    MAX(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text])) ELSE qs.statement_end_offset/2 - qs.statement_start_offset/2 + 1 END), CHAR(13), ' '), CHAR(10), ' '))  AS sample_statement_text,
    
    MIN(CAST(query_plan as varchar(max))) AS 'ShowPlan XML'
    
    FROM sys.dm_exec_query_stats AS qs
    
    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
    
    CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) as sp
    
    GROUP BY query_plan_hash, query_hash
    
    ORDER BY sum(total_worker_time) ASC;
    

    This is a query which can help you identify queries which have the same query plan hash

    SQL Server Books Online topic “Finding and Tuning Similar Queries by Using Query and Query Plan Hashes” has more information on this topic. The query hash feature was added in SQL Server 2008 which made it easier to troubleshooting performance issues caused by ad-hoc queries which differed in just literal values. RML Utilities does a similar task by creating query hash but now if you are troubleshooting on the server, you can do this using DMVs without having to capture a profiler trace.

    The right operand supplied to the Apply operator is a function of one or more column values that are present in the left operand. So basically, the right operand is a table-valued expression of which is evaluated once for each row that appears in the left operand. The Cross Apply and Outer Apply are the two flavors of the Apply operator. So if I wanted to simulate an Apply Operation without the Operator itself, it would require the use of temporary tables or table variables.

    I use the APPLY operator a lot while parsing XML data like Process Monitor traces or XML query plans which make life a lot easier and saves me from writing a huge bunch of T-SQL code.

    Profiler events for different scenarios

    A few moons back I had blogged about the high frequency events in SQL Server. In this post, I shall talk about the common scenarios where profiler traces are collected for SQL Performance issues and what kind of events need to be captured in the server side traces that are configured for the data collection.

    While configuring a profiler trace, always include all data columns for that event as that doesn’t have an overhead as much as the number of events being captured during the trace capture.

    In this post, I shall cover two of the most common issues that CSS collects profiler data for addressing SQL Performance issues: Attentions/Timeouts and High CPU usage/Long Running queries.

    Attentions or Timeouts

    Attentions or Timeouts can occur due to client side application timeout setting (mostly this is the case) or due to database engine side issue (like failure to acquire memory, problem on the server side connectivity layer etc.). During this time, you will have a lot of queries starting but not completing successfully. If you are dealing with such a scenario, then you would need to capture the Starting events using a profiler trace to find out which statements didn’t complete and encountered the Attention event.

    The events that need to be captured are:

    All Database Events

    Log File Auto Shrink
    Data File Auto Shrink
    Log File Auto Grow
    Data File Auto Grow

    All Errors and Warnings

    Attention (Important for tracking the timeouts)
    ErrorLog
    EventLog
    Exception
    Hash Warning
    Execution Warnings
    Sort Warnings
    Missing Column Statistics
    Missing Join Predicate
    Exchange Spill Event
    Blocked process report
    User Error Message
    Background Job Error
    Bitmap Warning
    Database Suspect Data Page
    CPU threshold exceeded

    Lock:Deadlock Chain and Lock:Timeout (timeout > 0)

    Performance: Auto Stats

    Showplan All or Showplan XML (As execution plan will not be available for queries which didn’t complete)

    Showplan Statistics Profile or Showplan XML Statistics Profile (For comparing estimated and actual query plan)

    Security Audit: Audit Login, Audit Login Failed and Audit Logout

    Stored Procedures events:

    RPC:Starting
    SP:Recompile
    SP:Starting
    SP:Completed
    SP:StmtStarting
    SP:StmtCompleted
    RPC:Completed

    Transactions: SQLTransaction is required for tracing back the transaction if there are nested transactions). DTCTransaction is required only if DTC transactions are getting timed out.

    TSQL events:

    SQL:BatchCompleted
    SQL:BatchStarting
    SQL:StmtStarting
    SQL:StmtCompleted
    SQL:StmtRecompile

    Cursor events: CursorRecompile and CursorImplicitConversion (Required for checking if the requested cursor is being implicitly converted or if the cursor is being frequently recompiled).

    The above events should be sufficient to track down any query timeouts and the common root causes for the same.

    High CPU usage/Long Running queries

    Another set of issues that are commonly dealt with while troubleshooting SQL Performance issues are Long Running Queries or Queries causing high CPU usage on the server. In such a case, the above set of events would be definitely be useful. If you are certain that the profiler trace can be captured for the entire duration of the query execution, then for highly transactional servers, you can omit the capture of the following events:

    SQL: BatchStarting
    SQL: StmtStarting
    SP: StmtStarting
    SP: Starting
    RPC: Starting

    The above logic of configuring a profiler trace will also help you capture trace data when you are trying to track down queries which are causing excessive amount of IO on the SQL instance.

    Reference for capturing blocking information on a SQL instance: SQL 2000 Blocker Script output/SQL 2005 PerfStats

    Keep in mind that the profiler data captured may not be sufficient for root cause analysis if you are not capturing SQL blocking information and performance monitor data in conjunction with the profiler traces.

    In case you need additional events like deadlock events or lock events, then refer to my previous post and find out if they fall in the high frequency event category and when to capture the particular event.

    TIP: As much as possible, avoid adding Text filters to your profiler trace configuration as it can lead to high CPU usage for highly active OLTP environments during filter evaluation.

    Hope that the above information will help you reduce the excessive amount of data collection for troubleshooting SQL performance issues.

    Approach to SQL Performance issues

    Performance degradation can happen due to multiple reasons. The main bottlenecks that would affect performance are:

    1. Memory: Physical or Virtual – Your server could be experiencing a physical crunch which would point us to the Available Physical RAM or a crunch on the Virtual Address Space. This would involve finding out the queries which are consuming high memory during compilation or execution and narrow down on the single biggest contributor and work towards rectifying the problem with the query.

    2. Disk IO: There could be an issue with SQL Server queries running slowly due to large amount of IO being done by them or a disk performance issue.

    3. CPU: It could be because the queries executing on your server are consuming high amount of CPU time. In such a case, it be statistics that are out-of-date or data that has increased which introduces data skew.

    4. Blocking: The queries executing on the server could be blocked and waiting for a particular resource and thus increasing the query duration.

    How to perform post-mortem analysis for SQL performance problems?

    If you are facing a performance degradation on the SQL instance after an upgrade, we would need to analyze the PSSDIAG collected to find out the top bottlenecks. There are two ways to do this:

    1. Comparison Analysis: You identify the a query batch or stored procedure which is a problem and provide us a baseline (w.r.t. normal execution time for the query in question). We capture a “fast” and a “slow” run and we perform a comparison between the two runs and we tell you what is biggest bottleneck: Memory, Disk, CPU or Database/Environment Settings like database statistics, query SET options, difference in parameters etc. Based on what is identified as the culprit, you will be able to identify/suggest action items to help remove the culprit.

    2. Bottleneck Analysis: In case you do not have a baseline, you can perform a bottleneck analysis and then stack rank them for you in case there is more than one bottleneck. Based on the analysis, you will be able to identify the biggest bottleneck and can implement action items to mitigate the single largest bottleneck.

    If you need to perform live troubleshooting on the server, then you can make use of DMVs if you are on SQL Server 2005 or above or use Performance Dashboard (mentioned below).

    RML Utilities Download:

    http://www.microsoft.com/downloads/details.aspx?FamilyId=7EDFA95A-A32F-440F-A3A8-5160C8DBE926&displaylang=en

    More Info on how to use the RML Utilities:

    http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx

    http://sqlcat.com/technicalnotes/archive/2008/02/01/precision-performance-for-microsoft-sql-server-using-rml-utilities-9-0.aspx

    If you need to analyze blocking data on the server, then you can use SQL Nexus for the same.

    SQL Nexus Download Link:

    http://www.codeplex.com/sqlnexus

    Another option would be to use SQL Server 2005 Performance Dashboard in case you are testing your application on SQL Server 2005. This tool lets you view your server status without running PSSDIAG and provides reports to identify Long Running queries and also identify which queries are consuming the highest amount of resources (IO/CPU/Memory).

    Performance Dashboard:

    http://www.microsoft.com/downloads/details.aspx?FamilyId=1d3a4a0d-7e0c-4730-8204-e419218c1efc

    Performance Dashboard for SQL Server 2008

    http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/sql-server-performance-dashboard-reports-in-ssms-introduction-install-datediff-error-amp-modified-for-sql-2008.aspx