Why can I not produce the same plan with a copy of the production database?

This question has been asked multiple times for SQL Server 2005/2008 environments. Well, the answer is SQL Server has become a bit smarter when it comes to the Query Optimizer. Now it takes into account the hardware environment and database session state also. (Sometimes, the optimizer may be a bit too smart for it’s own good 🙂 but that is discussion not being addressed in this thread)

To determine an optimal query plan, SQL Server 2005 and 2008 uses the following information:

  1. The database metadata – The table statistics should hold the same information i.e. same of data distribution.
  2. The hardware environment – Is the Physical Memory (RAM) and the number of Processors identical to the letter on the two machines.
  3. The database session state

Typically, you must simulate all these same types of information if you want to reproduce the behavior of the query optimizer on a test system. If you are lucky, then without 2 & 3 being satisfied, you might land up with the same plan. In scenarios where you don’t, Option 2 & 3 would be a good option to simulate before running off to solve the question:

Why is Server A generating a better plan than Server B?

With the advent of Virtualization, the simulation of the physical memory and CPU processors is not that big a deal as before. Just thought I would answer this question because many time I have been asked to explain why the same database backup doesn’t produce the same plan as Server A. The first option is to always create statistics-only copy of your database and see if you can reproduce the plan that you see on your production server. If yes, then you can safely proceed to the next step of troubleshooting the bad plan i.e. find out the most expensive part of the plan and take necessary steps to tune it like adding covering indexes, defining statistics, re-placing the join order, adding query hints etc.

Very often SQL CSS team would require a statistics clone of your database, so that they can reproduce the issue in-house. This would not contain any data from the tables but a clone of the database metadata. So, in case you need to troubleshoot a performance issue where you suspect the query plan to be the culprit, you also use a statistics clone and use that on a test server to check if you reproduce the so-called “bad” plan. The reason I mention a test environment because sometimes it is not possible to troubleshoot a query performance issue on a production server. And generating a database statistics clone for a few tables is much faster than a backup restore of the entire database.

You can use the information mentioned in the article below to create a statistics clone of your database:

How to generate a script of the necessary database metadata to create a statistics-only database in SQL Server 2005 and in SQL Server 2008

How to find out the missing indexes on a SQL Server 2008 or 2005 instance along with the CREATE INDEX commands

I had previously blogged on Missing Indexes and how useful a feature this is for troubleshooting query performance issues. Here is a script for identifying the missing indexes currently on your instance. However, you would definitely need to show prudence in implementing the indexes reported by these DMVs after a due round of testing. You would also need to make sure that nothing else is broken by implementing new indexes.  

 SELECT CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS improvement_measure, 'CREATE INDEX missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle) + ' ON ' + mid.statement + ' (' + ISNULL (mid.equality_columns,'') + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL (mid.inequality_columns, '') + ')' + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement, migs.*, mid.database_id, object_name(mid.[object_id],mid.database_id) as objectname 

FROM sys.dm_db_missing_index_groups mig 

INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle 

INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle 

WHERE CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10 

ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC 

How to purge MSDB history using T-SQL scripts

One of the Maintenance Plan Tasks available is the History Cleanup Task which can help you keep the MSDB database size in check. This Maintenance Plan Task uses the MSDB Stored Procedures sp_delete_backuphistory and sp_purge_jobhistory to cleanup the historical information inside the MSDB system tables.   

I have seen multiple scenarios where trying to delete all the historical data has proved to be fatal (because the log file grows disproportionately when the autogrow setting is set to percentage growth and due to large number of logged operations) . In such scenarios, you are left with two choices:   

1. Use the Maintenance Plan GUI to modify the oldest date for the History Cleanup Task for shorter durations and perform the cleanup by executing the job for shorter intervals.   


2. Use the script below to perform what the Cleanup Task would do.   

The script below would give a the list of commands to be executed. If you are concerned about the size of your MSDB transaction log file during the course of this purge operation, use the Checkpoint (if MSDB is in SIMPLE recovery which is the Default. Can’t think of a reason why someone would want it to be full.) piece which is currently commented in the below batch to keep the transaction log in size. Checkpoint operation under Simple Recovery model truncates the transaction log file.   

The script can be downloaded from TechNet also in case you face any formatting issues with the script above. Download Link: http://gallery.technet.microsoft.com/ScriptCenter/en-us/b3e2e276-b14a-406d-b67a-d8cbd9e1eabe   


-- Declare Local Variables

declare @mindate datetime, @oldest_date datetime, @sql varchar(8000), @currdate datetime, @oldestdate datetime -- Find out the oldest date from the Backup Set table

select @mindate = min(backup_start_date) from msdb..backupset

set @currdate = @mindate + 7

set @oldestdate = '2009-11-04 00:00:00.000' -- Modify this to the date till which you want your msdb history purged while

-- Begin a while loop to generate the commands to purge the MSDB entries

while (@currdate <= @oldestdate)


set @sql = 'EXEC msdb..sp_delete_backuphistory ''' + cast(@currdate as varchar(20)) + ''''

print @sql

set @sql = 'EXEC msdb..sp_purge_jobhistory @oldest_date = ''' + cast(@currdate as varchar(20)) + ''''

print @sql

print char(13)

-- Optional if you are running out of space in MSDB

--print 'use msdb' + char(13) + 'checkpoint'

-- Increment value and move on to the next date

set @currdate = @currdate + 7 -- The time interval can be modified to suit your needs end


-- End of while loop

set @sql = 'EXEC msdb..sp_delete_backuphistory ''' + cast(@oldestdate as varchar(20)) + ''''

print @sql

set @sql = 'EXEC msdb..sp_purge_jobhistory @oldest_date = ''' + cast(@oldestdate as varchar(20)) + ''''

print @sql



How to find out Security Errors on SQL Server 2005 instance and above including the API call it failed on

More often than not, while troubleshooting SQL Server Security Issues, you end up performing post-mortem analysis. Under such circumstances, the Ring Buffer output comes in handy because it stores the Security Errors that were reported in the past for a certain period of time. 

For an example, let’s say if you tried to create a SQL Login with the Password Policy enabled and provided a password that didn’t match a password policy, then you will receive an error stating that the password is not complex enough. This error will be stored in the ring buffer. When you execute the query below, you will find out the SPID that raised the error along with the API Name that is failing. In the sample output for this scenario described above, you will find that the NetValidatePwdPolicy failed. 

If you convert the ErrorCode value 0x8C5 (= 2245) to decimal and do a NET HELPMSG <number> from command prompt, you will find more information on why that error occurred. 

Output of NET HELPMSG 2245: 

The password does not meet the password policy requirements. Check the minimum password length, password complexity and password history requirements. 

The Calling API Name and the API Name are very helpful when troubleshooting Domain Account related security errors as it would call functions related to Directory Services whose documentation can be found on Tech Net which would make your troubleshooting easier. 

Sample Output: 

Notification_time: 2009-11-06 08:37:08.023
ErrorCode: 0x8C5             
CallingAPIName: NetValidatePwdPolicy                                       
APIName: NetValidatePwdPolicy                         
SPID: 53       
Record Id: 0           
Type: RING_BUFFER_SECURITY_ERROR                          

For SQL Server 2005: 

SELECT CONVERT (varchar(30), GETDATE(), 121) as runtime,
DATEADD (ms, -1 * ((sys.cpu_ticks / sys.cpu_ticks_in_ms) - a.[Record Time]), GETDATE()) AS Notification_time, 
a.* , sys.ms_ticks AS [Current Time]
x.value('(//Record/Error/ErrorCode)[1]', 'varchar(30)') AS [ErrorCode],
x.value('(//Record/Error/CallingAPIName)[1]', 'varchar(255)') AS [CallingAPIName],
  x.value('(//Record/Error/APIName)[1]', 'varchar(255)') AS [APIName],
x.value('(//Record/Error/SPID)[1]', 'int') AS [SPID],
x.value('(//Record/@id)[1]', 'bigint') AS [Record Id],
x.value('(//Record/@type)[1]', 'varchar(30)') AS [Type],
x.value('(//Record/@time)[1]', 'bigint') AS [Record Time]
FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = 'RING_BUFFER_SECURITY_ERROR') AS R(x)) a
CROSS JOIN sys.dm_os_sys_info sys
ORDER BY a.[Record Time] ASC

 For SQL Server 2008

SELECT CONVERT (varchar(30), GETDATE(), 121) as runtime,
dateadd (ms, (a.[Record Time] - sys.ms_ticks), GETDATE()) as [Notification_Time],
a.* , sys.ms_ticks AS [Current Time]
x.value('(//Record/Error/ErrorCode)[1]', 'varchar(30)') AS [ErrorCode],
x.value('(//Record/Error/CallingAPIName)[1]', 'varchar(255)') AS [CallingAPIName],
  x.value('(//Record/Error/APIName)[1]', 'varchar(255)') AS [APIName],
x.value('(//Record/Error/SPID)[1]', 'int') AS [SPID],
x.value('(//Record/@id)[1]', 'bigint') AS [Record Id],
x.value('(//Record/@type)[1]', 'varchar(30)') AS [Type],
x.value('(//Record/@time)[1]', 'bigint') AS [Record Time]
FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = 'RING_BUFFER_SECURITY_ERROR') AS R(x)) a
CROSS JOIN sys.dm_os_sys_info sys
ORDER BY a.[Record Time] ASC


Additional Reference 


Misconceptions around –T1118

There has been a lot on debate on whether having the same number of tempdb data files as the number of physical cores help a SQL instance in case you are facing Latch Contention issues on tempdb. This is typically seen with the wait-resource in the sysprocesses or sys.dm_exec_requests (depending on what you use to monitor blocking) output with values like 2:1:X or 2:2:X with having non-zero waittimes.

An article was published for this particular issue which asked you to enable –T1118 and split the tempdb to have multiple data files:

328551    Concurrency enhancements for the tempdb database

This trace flag switches allocations in tempdb from single-page at a time for the first 8 pages, to immediately allocate an extent (8 pages). It’s used to help alleviate allocation bitmap contention in tempdb under a heavy load of small temp table creation and deletion.

Furthermore, with the advent of new monster systems having 64 processors (16 quad cores) and 128GB of RAM, would it make sense to split the tempdb into multiple files for SQL Server 2005 and 2008. One of the things that is not commonly known as mentioned by Paul in his post below is that on SQL Server 2005 and 2008, there is a temp table cache that is maintained for temp tables created on the instance. In case, your temp table creation workload is very high, enabling –TF1118 wouldn’t benefit you.

A quote from that post:

As far as data files go though, the number has changed. Instead of a 1-1 mapping between processor cores and tempdb data files (*IF* there’s latch contention), now you don’t need so many – so the recommendation from the SQL team is the number of data files should be 1/4 to 1/2 the number of processor cores (again, only *IF* you have latch contention). The SQL CAT team has also found that in 2005 and 2008, there’s usually no gain from having more than 8 tempdb data files, even for systems with larger numbers of processor cores. Warning: generalization – your mileage may vary – don’t post a comment saying this is wrong because your system benefits from 12 data files. It’s a generalization, to which there are always exceptions.

Yet again, Paul Randall has published an informative blog post around the usage of -TF1118: