Approach to SQL Performance issues

Performance degradation can happen due to multiple reasons. The main bottlenecks that would affect performance are:

1. Memory: Physical or Virtual – Your server could be experiencing a physical crunch which would point us to the Available Physical RAM or a crunch on the Virtual Address Space. This would involve finding out the queries which are consuming high memory during compilation or execution and narrow down on the single biggest contributor and work towards rectifying the problem with the query.

2. Disk IO: There could be an issue with SQL Server queries running slowly due to large amount of IO being done by them or a disk performance issue.

3. CPU: It could be because the queries executing on your server are consuming high amount of CPU time. In such a case, it be statistics that are out-of-date or data that has increased which introduces data skew.

4. Blocking: The queries executing on the server could be blocked and waiting for a particular resource and thus increasing the query duration.

How to perform post-mortem analysis for SQL performance problems?

If you are facing a performance degradation on the SQL instance after an upgrade, we would need to analyze the PSSDIAG collected to find out the top bottlenecks. There are two ways to do this:

1. Comparison Analysis: You identify the a query batch or stored procedure which is a problem and provide us a baseline (w.r.t. normal execution time for the query in question). We capture a “fast” and a “slow” run and we perform a comparison between the two runs and we tell you what is biggest bottleneck: Memory, Disk, CPU or Database/Environment Settings like database statistics, query SET options, difference in parameters etc. Based on what is identified as the culprit, you will be able to identify/suggest action items to help remove the culprit.

2. Bottleneck Analysis: In case you do not have a baseline, you can perform a bottleneck analysis and then stack rank them for you in case there is more than one bottleneck. Based on the analysis, you will be able to identify the biggest bottleneck and can implement action items to mitigate the single largest bottleneck.

If you need to perform live troubleshooting on the server, then you can make use of DMVs if you are on SQL Server 2005 or above or use Performance Dashboard (mentioned below).

RML Utilities Download:

http://www.microsoft.com/downloads/details.aspx?FamilyId=7EDFA95A-A32F-440F-A3A8-5160C8DBE926&displaylang=en

More Info on how to use the RML Utilities:

http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx

http://sqlcat.com/technicalnotes/archive/2008/02/01/precision-performance-for-microsoft-sql-server-using-rml-utilities-9-0.aspx

If you need to analyze blocking data on the server, then you can use SQL Nexus for the same.

SQL Nexus Download Link:

http://www.codeplex.com/sqlnexus

Another option would be to use SQL Server 2005 Performance Dashboard in case you are testing your application on SQL Server 2005. This tool lets you view your server status without running PSSDIAG and provides reports to identify Long Running queries and also identify which queries are consuming the highest amount of resources (IO/CPU/Memory).

Performance Dashboard:

http://www.microsoft.com/downloads/details.aspx?FamilyId=1d3a4a0d-7e0c-4730-8204-e419218c1efc

Performance Dashboard for SQL Server 2008

http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/sql-server-performance-dashboard-reports-in-ssms-introduction-install-datediff-error-amp-modified-for-sql-2008.aspx

SQL Server Version Numbers

Understanding SQL version numbers can be a bit confusing at times and remembering the builds an even bigger hassle. I just replied with the following post on MSDN recently. I thought I would convert the post into a blog post for other people to see and use as appropriate.

SQL Server has the following components:

1. Database engine (Engine, Replication, Full-text)

2. SSIS – Integration Services (SQL Server 2005 and above)

3. Reporting Services

4. Tools and Shared Components (Eg. SSMS)

5. Notification Services (SQL Server 2005 and above)

6. Analysis Services

When you install a SQL Server component, the RTM version of that component is installed. The SQL Server release version has a version major number which will stay constant for a release. The 4 digit number that keeps varying is the build number of the current version of your component. This number changes whenever you apply a hotfix/security fix/cumulative update/service pack for that particular component of SQL Server. Very rarely do the build numbers remain constant across patches (Seen to happen in Reporting Services for certain builds).

Version major numbers: This is the first number in the version output:

SQL Server 7.0 – 7.00.XXXX

SQL Server 2000 – 8.00.XXXX

SQL Server 2005 – 9.00.XXXX

SQL Server 2008 – 10.00.XXXX

SQL Server 2008 R2- 10.50.XXXX

The above article has all the build numbers that were released for SQL Server.

Additional reference:

http://support.microsoft.com/kb/321185

http://www.sqlsecurity.com/FAQs/SQLServerVersionDatabase/tabid/63/Default.aspx

Cloning SQL Permissions for users

I have an application which has a bunch of specific permissions granted to some objects within the database. I encountered a problem while trying to migrate this application from one server to another because I needed to recreate these permissions which were added over the period of a year as and when new features were added to the application.

I ended up using sp_helprotect to list out the permissions for the objects that I was interested in. But I still needed to convert the permissions into the necessary GRANT/REVOKE statements. This can be tedious when you are trying to manipulate permissions across databases and multiple objects. What I ended up doing was writing a crude T-SQL script to help me create the necessary GRANT/REVOKE scripts based on the sp_helprotect output.

Note: sp_helprotect does not return information about securables that were introduced in SQL Server 2005. Use sys.database_permissions and fn_builtin_permissions instead.

So, if you have a securable that you are interested in which was introduced in SQL Server 2005, then this is not going to help you.

Note: This script has been tested only for the following objects:

1. Tables

2. Stored Procedures

3. Functions (UDFs/TVFs)

Modification: Aug 8, 2012: Thanks to Graham [blog] for suggesting the corrections.

Script:

set nocount on

-- Declare and initialize local variables
declare @newuser varchar(255)
set @newuser = 'newuser' -- This is the new user that you need to add permissions for

-- Create temp table to store sp_helprotect output

create table #tbl_permissions
(Owner varchar(50),
Object varchar(255),
Grantee varchar(255),
Grantor varchar(255),
ProtectType varchar(50),
Action varchar(50),
Col varchar(50)
)

-- Store sp_helprotect output in a temp table
insert into #tbl_permissions
exec sp_helprotect null,'olduser'

-- Get the necessary GRANT/REVOKE/DENY T-SQL commands
select RTRIM(LTRIM(ProtectType)) + ' ' +
CASE Action WHEN 'CONNECT' THEN Action + ' TO ['+@newuser+']'
ELSE Action + ' ON OBJECT::'+OWNER+'.'+OBJECT+' TO [' +@newuser+']'
END
from #tbl_permissions

-- Drop the temporary table
drop table #tbl_permissions

Now that a bit of my work was easier, I decided to put this script out on the web so that someone else can benefit from the same. I am sure that there are multiple T-SQL scripts which do the same but I found the one below very helpful. I shall try and make it more comprehensive by using sys.database_permissions in a future blog post.

[Blog Update] My last month’s (May) posts on SQLServerFAQ

This has been an age long debate on whether to use GUIDs or INTs. Thought I would pen down some points based on some tests that I ran on my test machine.

GUID vs INT Debate

SQL Backup softwares can cause Out Of Memory conditions if the BufferCount and MaxTransferSize is not taken into account on 32-bit systems:

Incorrect BufferCount data transfer option can lead to OOM condition

In the recent past, I worked on a few issues where the use of monotonically increasing clustered index keys were turning into hotspots in the database leading to latch blocking and causing performance degradation as an end result. I illustrate this with the help of an example in the blog post below.

Monotonically increasing clustered index keys can cause LATCH contention

Replication Agent has not logged a message in 10 minutes

I saw multiple posts on MSDN regarding the following message: 

“The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity.” 

More often than not the above message can be ignored safely. The next obvious question is when can this message be ignored. If you find that your replication agent history is not reporting any error messages, then the above message is benign. Typically, this message means the agent is busy doing its work and has no resources to respond to status inquiries. 

Why does this error message get generated? 

This error message gets generated because of the Distribution heartbeat interval property. This property governs how long an agent can run without logging a progress message. If your replication agents are not reporting an error message and you are seeing the above message, then you could change your heartbeat interval to a higher value. One of the option could be that you changed the history logging option for your replication agent so that it doesn’t log any message. 

Query: 

exec sp_changedistributor_property @property = 'heartbeat_interval', @value = <value in minutes>;

 

Reference: sp_changedistributor_property (Transact-SQL) 

I have read lots of opinions as to why this is a bug and if you subscribe to that school of thought, please create a Connect item at http://connect.microsoft.com/sqlserver.