T-SQL Tuesday #49: Why was SQL waiting?


A lot of my customer engagements involve identifying waits experienced by SQL Server queries on a production environment. Since this month’s topic is WAITS… I see this as a good opportunity to talk about how I go about the analysis of the collected data to identify the blocking chains.

Before I start rambling off, a note of thanks to my friend Robert Davis [B|T] for hosting this month’s T-SQL Tuesday! He couldn’t have chosen a better topic! 🙂

In this post, I will be talking about how to slice and dice the diagnostic data collected by SQL Perf Stats script that is imported into a SQL Nexus database.

The tables of interest in the SQL Nexus database are:

1. tbl_Requests – This table holds the combined output of the snapshots collected by the Perf Stats script from the sys.dm_exec_requests, sys.dm_exec_sessions, sys.dm_exec_connections and sys.dm_os_tasks

2. tbl_NotableActiveQueries – This table holds the output sys.dm_exec_sql_text and sys.dm_exec_query_stats snapshots collected by the Perf Stats script.

The file that needs to be present when SQL Nexus is imported the collected diagnostic data is <server name>__SQL_2008_Perf_Stats_Startup.OUT. The plethora of information available in this file can be mind boggling at times. So pushing all this information into a database makes a lot of sense.

The first order of the day is to determine if there was actually blocking experienced during the data collection. The output of the query will give you collected snapshots where blocking was detected.

<span style="font-size: small;">select runtime,count (*) as blocked_processes
from tbl_requests
where blocking_session_id > 0 and session_id <> blocking_session_id
group by runtime
having count(*) > 1
order by runtime

The next order of the day is to find out which sessions were blocked. The SQL Nexus reports do a good job of drilling down into each blocking session observed. But if you want a collated view of all the blocking along with the list of head blockers, then the following T-SQL script could prove useful especially when the .OUT file being imported is a few hundred MBs in size! This would help you slice and dice the MBs or GBs of data that was imported into the SQL Nexus database tables and get a consolidated view of the blocking chains.

/*
This Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment. THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. We grant You a nonexclusive, royalty-free right to use and modify the Sample Code and to reproduce and distribute the object code form of the Sample Code, provided that You agree: (i) to not use Our name, logo, or trademarks to market Your software product in which the Sample Code is embedded; (ii) to include a valid copyright notice on Your software product in which the Sample Code is embedded; and (iii) to indemnify, hold harmless, and defend Us and Our suppliers from and against any claims or lawsuits, including attorneys’ fees, that arise or result from the use or distribution of the Sample Code.

Author: Amit Banerjee

Modified: December 10, 2013

Description:

This T-SQL script extracts information about the blocking chains from the SQL Nexus database

*/

set nocount on
print '================== Blocking Analysis ====================='
print ''
print 'Runtime Information'
select min(runtime) as [Start Time], max(runtime) as [End Time] from tbl_Requests

if ((select count(*) from tbl_requests where blocking_session_id <> 0) > 0)
begin
print 'Blocking found'
print ''
print '============= Blocking Chains (Runtimes considered where blocking_session_id processes > 5) ==================='
-- Show a summary of the # of blocked sessions for every snapshot collected by the PerfStats script
select runtime,count (*) as blocked_processes
from tbl_requests
where blocking_session_id > 0 and session_id <> blocking_session_id
group by runtime
having count(*) > 1 -- The threshold can be changed as appropriate
order by runtime

print ''
print ''
print '================= Head Blocker Information ==================='
if (object_id('tmp_runtimes') <> NULL)
begin
drop table tmp_runtimes
end

-- Create a table to store the runtimes
CREATE TABLE tmp_runtimes(
[row_num] [bigint] IDENTITY(1,1) NOT NULL,
[runtime] datetime
) ON [PRIMARY]

-- Create a list of the distinct runtimes
insert into tmp_runtimes
select distinct (runtime)
from tbl_requests
where blocking_session_id <> 0
order by runtime

-- Start a while loop to fetch the data for the head blocker and blocked sessions
declare @count int,@loop int,@runtime datetime
select @count = count(*) from tmp_runtimes
set @loop = 1
while (@loop <> @count)
begin
select @runtime = runtime from tmp_runtimes where row_num = @loop

print '~~~~~~~~~~~~~~~ Runtime: '
select @runtime
print ''
print '~~~ Head Blocker'

select a.session_id,a.ecid,a.blocking_ecid,wait_type,wait_duration_ms,command,b.dbname,b.procname,b.stmt_text
from tbl_requests a
left outer join tbl_notableactivequeries b
on a.runtime = b.runtime and a.session_id = b.session_id
where blocking_session_id = 0 and a.runtime = @runtime and a.session_id in
(select blocking_session_id
from tbl_requests where blocking_session_id <> 0 and runtime = @runtime)

print ''
print '~~~ Blocked Session Information'
select *
from tbl_requests a
left outer join tbl_notableactivequeries b
on a.runtime = b.runtime and a.session_id = b.session_id
where a.blocking_session_id <> 0
and a.runtime = @runtime
order by a.session_id,a.blocking_session_id

set @loop = @loop + 1

end

drop table tmp_runtimes

end

else
begin
print 'No Blocking found'
end

set nocount off

With the help of the above output, you can get a collated view of all the blocking chains encountered during the data collection period. CSS engineers are Microsoft use SQL Nexus extensively for analyzing diagnostic data collected for SQL Server performance issues. I hope you find this script useful in analyzing blocking chains and do keep the feedback coming in. The above script is available for download on SkyDrive as well.

Please note that the PerfStats script collects data at 10 second intervals by default! If your blocking chains are for shorter durations then you will need to change the waitfor delay interval. If your blocking duration is lesser than a second, then data collection using the DMVs would not be of much help and you would need to go down the path of Extended Events tracing or other invasive forms of data collection (like XPerf utility).

Advertisements

Concurrent backups: What is the big deal?


You are probably wondering why am I writing a blog post on concurrent backups! Well this is a topic that brings up a few questions during my customer visits. Apart from that the biggest reason, I want to talk about concurrent backups is because of backup softwares that you have active in your environments.

A lot of the backup softwares available in the market have the option of scheduling backups of the entire system. This involves backing up of SQL Server databases using the interfaces exposed through sqlvdi.dll. Ain’t your software really smart? However, sometimes the right hand doesn’t really talk to the left hand! It is quite possible that your server admins have scheduled backups of your systems around the same time you have scheduled a maintenance plan to take a backup of the SQL Server instance.

If you initiate two differential or full database backups concurrently, then you will experience a blocking condition as shown in the table below. SPID 53 was taking a VDI full-backup using the SQL Server Backup Simulator tool where as SPID 55 was also taking a native full-backup of the same database. In such a scenario, you will find that the backup which was running first will complete and only then will the second backup will continue. The second concurrent backup will be blocked during the entire duration of the first backup.

spid blocked waittime lastwaittype waitresource cmd dbid
53 0 6302 ASYNC_IO_COMPLETION   BACKUP DATABASE 8
53 0 0 MISCELLANEOUS   BACKUP DATABASE 8
53 0 412 BACKUPBUFFER   BACKUP DATABASE 8
53 0 412 BACKUPIO   BACKUP DATABASE 8
55 53 2386 LCK_M_U DB: 8 [BULKOP_BACKUP_DB] BACKUP DATABASE 8

This issue doesn’t manifest itself when you are you taking a volume snapshot backup of the database. The wait resource will always be DB: <dbid> [BULKOP_BACKUP_DB] with a wait type of LCK_M_U (update lock) for the second session that is running the concurrent backup operation. Concurrent log backups are allowed though starting from SQL Server 2005.

One of the situations where this can pose to be problematic is when the application that installed the database creates a maintenance plan during the application setup process. This can cause problems if you have concurrent full or differential backups scheduled from another software during the same time when your backup maintenance plan is running. Another big pitfall of concurrent backups running is that sometimes, the same job is responsible for both full and log backups of the database. If your log backup job doesn’t kick-off because your database backup is still being blocked, then you could run into a bloated transaction log issue for a database in full or bulk-logged recovery model. My colleague, Arvind, has also blogged about the same issue a few years back.

The following T-SQL script can help you determine if you have overlapping backups being taken for a SQL Server 2012 function. This script can also work for releases lower than SQL Server 2012 if you remove the LAG function as that T-SQL enhancement was introduced in SQL Server 2012.

select backup_start_date, backup_finish_date,

LAG(backup_start_date, 1) OVER (PARTITION BY database_name ORDER BY backup_start_date) as previous_start_date,

CASE datediff(mi,(LAG(backup_start_date, 1) OVER (PARTITION BY database_name ORDER BY backup_start_date)),backup_start_date) WHEN 0 THEN 'OVERLAPPING' ELSE '-' END as [Status] ,

datediff(mi,backup_start_date,backup_finish_date) as backup_time_mins, type

from msdb..backupset

where database_name = '<database name>' -- Replace with appropriate database name

and type <> 'L'

order by backup_start_date

Approach to SQL Performance issues


Performance degradation can happen due to multiple reasons. The main bottlenecks that would affect performance are:

1. Memory: Physical or Virtual – Your server could be experiencing a physical crunch which would point us to the Available Physical RAM or a crunch on the Virtual Address Space. This would involve finding out the queries which are consuming high memory during compilation or execution and narrow down on the single biggest contributor and work towards rectifying the problem with the query.

2. Disk IO: There could be an issue with SQL Server queries running slowly due to large amount of IO being done by them or a disk performance issue.

3. CPU: It could be because the queries executing on your server are consuming high amount of CPU time. In such a case, it be statistics that are out-of-date or data that has increased which introduces data skew.

4. Blocking: The queries executing on the server could be blocked and waiting for a particular resource and thus increasing the query duration.

How to perform post-mortem analysis for SQL performance problems?

If you are facing a performance degradation on the SQL instance after an upgrade, we would need to analyze the PSSDIAG collected to find out the top bottlenecks. There are two ways to do this:

1. Comparison Analysis: You identify the a query batch or stored procedure which is a problem and provide us a baseline (w.r.t. normal execution time for the query in question). We capture a “fast” and a “slow” run and we perform a comparison between the two runs and we tell you what is biggest bottleneck: Memory, Disk, CPU or Database/Environment Settings like database statistics, query SET options, difference in parameters etc. Based on what is identified as the culprit, you will be able to identify/suggest action items to help remove the culprit.

2. Bottleneck Analysis: In case you do not have a baseline, you can perform a bottleneck analysis and then stack rank them for you in case there is more than one bottleneck. Based on the analysis, you will be able to identify the biggest bottleneck and can implement action items to mitigate the single largest bottleneck.

If you need to perform live troubleshooting on the server, then you can make use of DMVs if you are on SQL Server 2005 or above or use Performance Dashboard (mentioned below).

RML Utilities Download:

http://www.microsoft.com/downloads/details.aspx?FamilyId=7EDFA95A-A32F-440F-A3A8-5160C8DBE926&displaylang=en

More Info on how to use the RML Utilities:

http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx

http://sqlcat.com/technicalnotes/archive/2008/02/01/precision-performance-for-microsoft-sql-server-using-rml-utilities-9-0.aspx

If you need to analyze blocking data on the server, then you can use SQL Nexus for the same.

SQL Nexus Download Link:

http://www.codeplex.com/sqlnexus

Another option would be to use SQL Server 2005 Performance Dashboard in case you are testing your application on SQL Server 2005. This tool lets you view your server status without running PSSDIAG and provides reports to identify Long Running queries and also identify which queries are consuming the highest amount of resources (IO/CPU/Memory).

Performance Dashboard:

http://www.microsoft.com/downloads/details.aspx?FamilyId=1d3a4a0d-7e0c-4730-8204-e419218c1efc

Performance Dashboard for SQL Server 2008

http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/sql-server-performance-dashboard-reports-in-ssms-introduction-install-datediff-error-amp-modified-for-sql-2008.aspx