Upcoming Zeollar Presentation

I will be presenting a 30 minute demo on how to perform post mortem analysis for SQL Performance issues using RML Utilities and SQL Nexus on 24th June, 2010. The recorded and live session will be available on Zeollar. Zeollar is a Microsoft India DPE initiative that gets you the latest technical content on a daily basis in different channels. Think of it like a personal television that switches on every day at a specified hour allowing you to switch channels and view the channel of your interest.

You can download the calendar invite from here.

Token Perm issues on SQL Server 2008

There were performance issues identified on SQL Server 2005 builds because of the large size of the access cache store on certain servers with huge ad-hoc query workloads. I had blogged about this earlier along with information on how to customize the cache store size using trace flags and registry entries after a certain build of SQL Server 2005.

With SQL Server 2008, we provide access check cache bucket count and access check cache quota size customization through sp_configure. So, if you do need to change these values, you do not have to add trace flags or make registry changes. It is as simple as changing the MaxDop or Max Server Memory on your server.

However, be advised that it is not recommended to change these values unless and until CSS recommends you to do so.

Reference:

Description of the "access check bucket count" and "access check cache quota" options that are available in the sp_configure stored procedure in SQL Server 2008

Cursor coding horrors

Most coders have stereotypes of one kind or another. Given an opportunity, the developer will give into their stereotype. The coding guidelines is what keeps them from using it. I personally have a coding stereotype in T-SQL when defining a loop. Cursors are one of the most commonly used coding constructs. However, I tend to stick to my trusted friend a WHILE loop. I have worked on multiple issues where the T-SQL developer got the next FETCH wrong due to a oversight. Let me demonstrate that with an example. I have a T-SQL sample code which prints out all the user sessions connect to the SQL instance. 

declare @session_id int 

declare cur_sysprocesses cursor for 

select session_id from sys.dm_exec_requests 

where session_id > 50 

open cur_sysprocesses 

fetch next from cur_sysprocesses into @session_id 

while @@FETCH_STATUS <> -1 

begin 

print @session_id 

if (@session_id > 50) 

fetch next from cur_sysprocesses into @session_id 

end 

close cur_sysprocesses 

deallocate cur_sysprocesses

I have the next FETCH inside the while loop and the loop is governed by the value of @@FETCH_STATUS. Since, I have defined it as NOT EQUAL TO –1, the cursor will end when the next fetch is unsuccessful. This is a perfectly alright. Not all cursors in a production environment are that simple. Some of them have levels of nesting and WHILE loops which have deep levels of nesting along with conditional blocks. So, let me modify the example and show you: 

declare @session_id int 

declare cur_sysprocesses cursor for 

select session_id from sys.dm_exec_requests 

open cur_sysprocesses 

fetch next from cur_sysprocesses into @session_id 

while @@FETCH_STATUS <> -1 

begin 

print @session_id 

if (@session_id > 50) 

fetch next from cur_sysprocesses into @session_id 

end 

close cur_sysprocesses 

deallocate cur_sysprocesses

Now the problem with the above cursor is that most system processes have a session id below 50. So, now my cursor will loop through infinitely because the next fetch will never occur since the check in the conditional block will never evaluate to true as the first fetch from the cursor will always result in a system spid lesser than 50. This seems like a really amateur mistake which you think is not possible. But believe me, this happens! Since, I am showing this to you with a common system DMV, you think this is not possible. However, when you are using user defined tables and cursors, this is quite possible. Never define your next fetch from the cursor within the WHILE loop in a conditional block which is dependant on the data being fetched from the table. 

Always have the next FETCH defined in the first level of the while loop. Never ever put this in a conditional block. Always find an alternative way to find out a workaround or a different way to decide on whether to process the data fetched from the cursor. The next FETCH from the cursor should NOT be defined in an IF conditional block. 

It’s a human being that is coding the cursor. There is always a chance of some unforeseen circumstance (un-thought-of scenario) which makes your cursor go into a doomed state (read: infinite loop). 

You can never be dependant on your data to decide the fate of your code path. 

To summarize, the next FETCH while your looping through your cursor should be in the first level of your code unless and until you have no other choice and want to avoid an infinite loop scenario.

Why can I not produce the same plan with a copy of the production database?

This question has been asked multiple times for SQL Server 2005/2008 environments. Well, the answer is SQL Server has become a bit smarter when it comes to the Query Optimizer. Now it takes into account the hardware environment and database session state also. (Sometimes, the optimizer may be a bit too smart for it’s own good 🙂 but that is discussion not being addressed in this thread)

To determine an optimal query plan, SQL Server 2005 and 2008 uses the following information:

  1. The database metadata – The table statistics should hold the same information i.e. same of data distribution.
  2. The hardware environment – Is the Physical Memory (RAM) and the number of Processors identical to the letter on the two machines.
  3. The database session state

Typically, you must simulate all these same types of information if you want to reproduce the behavior of the query optimizer on a test system. If you are lucky, then without 2 & 3 being satisfied, you might land up with the same plan. In scenarios where you don’t, Option 2 & 3 would be a good option to simulate before running off to solve the question:

Why is Server A generating a better plan than Server B?

With the advent of Virtualization, the simulation of the physical memory and CPU processors is not that big a deal as before. Just thought I would answer this question because many time I have been asked to explain why the same database backup doesn’t produce the same plan as Server A. The first option is to always create statistics-only copy of your database and see if you can reproduce the plan that you see on your production server. If yes, then you can safely proceed to the next step of troubleshooting the bad plan i.e. find out the most expensive part of the plan and take necessary steps to tune it like adding covering indexes, defining statistics, re-placing the join order, adding query hints etc.

Very often SQL CSS team would require a statistics clone of your database, so that they can reproduce the issue in-house. This would not contain any data from the tables but a clone of the database metadata. So, in case you need to troubleshoot a performance issue where you suspect the query plan to be the culprit, you also use a statistics clone and use that on a test server to check if you reproduce the so-called “bad” plan. The reason I mention a test environment because sometimes it is not possible to troubleshoot a query performance issue on a production server. And generating a database statistics clone for a few tables is much faster than a backup restore of the entire database.

You can use the information mentioned in the article below to create a statistics clone of your database:

How to generate a script of the necessary database metadata to create a statistics-only database in SQL Server 2005 and in SQL Server 2008
http://support.microsoft.com/?kbid=914288

Misconceptions around –T1118

There has been a lot on debate on whether having the same number of tempdb data files as the number of physical cores help a SQL instance in case you are facing Latch Contention issues on tempdb. This is typically seen with the wait-resource in the sysprocesses or sys.dm_exec_requests (depending on what you use to monitor blocking) output with values like 2:1:X or 2:2:X with having non-zero waittimes.

An article was published for this particular issue which asked you to enable –T1118 and split the tempdb to have multiple data files:

328551    Concurrency enhancements for the tempdb database
http://support.microsoft.com/default.aspx?scid=kb;EN-US;328551

This trace flag switches allocations in tempdb from single-page at a time for the first 8 pages, to immediately allocate an extent (8 pages). It’s used to help alleviate allocation bitmap contention in tempdb under a heavy load of small temp table creation and deletion.

Furthermore, with the advent of new monster systems having 64 processors (16 quad cores) and 128GB of RAM, would it make sense to split the tempdb into multiple files for SQL Server 2005 and 2008. One of the things that is not commonly known as mentioned by Paul in his post below is that on SQL Server 2005 and 2008, there is a temp table cache that is maintained for temp tables created on the instance. In case, your temp table creation workload is very high, enabling –TF1118 wouldn’t benefit you.

A quote from that post:

As far as data files go though, the number has changed. Instead of a 1-1 mapping between processor cores and tempdb data files (*IF* there’s latch contention), now you don’t need so many – so the recommendation from the SQL team is the number of data files should be 1/4 to 1/2 the number of processor cores (again, only *IF* you have latch contention). The SQL CAT team has also found that in 2005 and 2008, there’s usually no gain from having more than 8 tempdb data files, even for systems with larger numbers of processor cores. Warning: generalization – your mileage may vary – don’t post a comment saying this is wrong because your system benefits from 12 data files. It’s a generalization, to which there are always exceptions.

Yet again, Paul Randall has published an informative blog post around the usage of -TF1118:

http://www.sqlskills.com/BLOGS/PAUL/post/Misconceptions-around-TF-1118.aspx