CPU Drift Issues

I have seen a few cases where administrators have been concerned with CPU Drift and think that the SQL Server ERRORLOG reporting the following message is a serious cause for concern:

Error message 1
The time stamp counter of CPU on scheduler id 2 is not synchronized with other CPUs.
Error message 2

CPU time stamp frequency has changed from 191469 to 1794177 ticks per millisecond. The new frequency will be used

The SQL Server ERRORLOG reports a variety of informational, error and warning messages and not all messages are problems. This message is just telling you that CPU frequency between one or more processors is not synchronized. And how does this affect you??
Quoting from one of the below articles:

"Generally the Microsoft SQL Server support team considers drift less than several seconds, noise."

If you are concerned that the drift values are actually affecting your test results, then it would be a good idea to have the Speed Step, Power Now etc. features turned off during your testing phase. This would require changes at the BIOS level. Also, it would be a good idea to have consulted your H/W manufacturer and find out if there are any updates that require to be installed. Once again, I reiterate unless the drift values are constantly reporting several seconds for prolonged periods, only then do we have a Beginning of a problem, otherwise these warnings are mostly noise.
Additionally, trace flag (–T8033) can be used to suppress the drift warnings. However, please do not enable this trace flag on an instance of SQL Server 2005 unless and until, you fully understand the ramifications of ignoring the drift warnings.

Related Links
SQL Server timing values may be incorrect when you use utilities or technologies that change CPU frequencies
http://support.microsoft.com/kb/931279/en-us
SQL Server 2005 SP2 will introduce new messages to the error log related to timing activities
http://blogs.msdn.com/psssql/archive/2006/11/27/sql-server-2005-sp2-will-introduce-new-messages-to-the-error-log-related-to-timing-activities.aspx
SQL Server 2005 – RDTSC Truths and Myths Discussed
http://blogs.msdn.com/psssql/archive/2007/08/19/sql-server-2005-rdtsc-truths-and-myths-discussed.aspx

Finding out root cause for Cluster Failovers

We do get quite a few issues regarding root cause analysis for Cluster Failovers. Failovers mostly happen due to the IsAlive check failing for the SQL Server resource after which the following two conditions arise:
1. SQL Server service restarts on the same cluster node
2. SQL Server resource fails over to a member cluster node

So, for looking into the possible root causes of a cluster failover, a SQL version of the MPS Reports capture is required from the node on which SQL is currently active. From the data requested by the PSS Engineer, the following files would be of utmost importance:
1. All the SQL Server ERRORLOGs
2. Windows Event Logs (System/Application)
3. Cluster Log

Based on the SQL Server ERRORLOGs, we would check for any errors or tell-tale signs which would point us to why the IsAlive check failed for the SQL Server resource. After that, we would look into the cluster log and the windows event logs to find out co-relation among the events during the failover time on the server.

Since, the cluster log rolls over and also the SQL Server ERRORLOGs can roll over very quickly if a job is in place to recycle if after a certain size, it is a very good idea to save the cluster log and the SQL Server ERRORLOG(s) right after the failover to prevent them from rolling over and overwriting valuable data from the problem time period.

Sometimes, a post mortem analysis provides us a hypothesis of what happened but doesn’t paint the picture completely due to lack of data from the period the problem happened. Based on the nature of the problem, the PSS Engineer might ask to you to do the following for the the next problem occurrence along :
1. Capture a light-weight PSSDIAG round the clock with file rollover so that we can track what sort of events were happening on the SQL instance right before the failover.
2. Or a filtered dump of the SQL Process during the problem period if there is heavy blocking on the server or if the failover had occurred due to memory dump(s) on the server
3. OR a round the clock Perfmon log if there were possible external memory pressure on the server.

NOTE: Keep in mind that the cluster logs are always in GMT format. So you need to add/substract the time difference between your timezone and GMT when analyzing the cluster logs.

Allow Updates Option for SQL Server 2005

SQL Server 2005 doesn’t have the allow updates option. So, if you execute:

sp_configure ‘allow_updates’,1

and then if you execute reconfigure, you would get the following error:

Msg 5808, Level 16, State 1, Line 2
Ad hoc update to system catalogs is not supported.

After this all changes to the sp_configure settings followed by a reconfigure would yield this error. To rectify this, you will have to change the allow_updates option back to 0 and run reconfigure. As per SQL Server 2005 Books Online:

This option is still present in the sp_configure stored procedure, although its functionality is unavailable in Microsoft SQL Server 2005 (the setting has no effect). In SQL Server 2005, direct updates to the system tables are not supported.

Changing the allow updates option will cause the RECONFIGURE statement to fail. Changes to the allow updates option should be removed from all scripts.

So, if in case you use allow_updates in any script in SQL Server 2005, please refrain from doing so. Updates to the system catalogs are not permitted in SQL Server 2005 and any attempt/changes made to the System Resource database would get you into an unsupported scenario.

WMI Tracing equivalent of SQL Server Profiler Traces

A lot of companies find the need to monitor all DDL and DML activity on the server for Compliance reasons. Though SQL Profiler Traces provides this functionality by using Stored Procedures and Functions. These system stored procedures can be used from within your own applications to create traces manually, instead of using SQL Server Profiler. This allows you to write custom applications specific to the needs of your enterprise. However, you might choose not to use this approach due to the overhead of running profiler traces on the server. You can use server side profiler traces which tend to have a lesser performance impact on the server. 

Another method is to use WMI Events to monitor SQL Server Events. Consider the following example, let’s say you want to monitor the SQL Server events for all Update Statistics Events. Then you can use script below to create the WMI Alert and also the job to put the event details into a SQL Server database table. 

<script> 

USE TestDB 

GO 

-- Creating the table to store the DDL Events 

CREATE TABLE [dbo].[UPD_STATS_Events]( 

[AlertTime] [datetime] NULL, 

[SPID] [int] NULL, 

[DBName] [nvarchar](100) NULL, 

[TextData] [nvarchar](max) NULL 

) ON [PRIMARY] 

GO 

-- Adding the job to run when the WMI Alert is raised 

EXEC msdb.dbo.sp_add_job @job_name=N'Capture Update Statistics Events', 

@enabled=1, 

@description=N'Job for responding to DDL events' ; 

GO 

-- Adding the job step 

EXEC msdb.dbo.sp_add_jobstep 

@job_name = N'Capture Update Statistics Events', 

@step_name=N'Insert data into Update Statistics Events table', 

@step_id=1, 

@on_success_action=1, 

@on_fail_action=2, 

@subsystem=N'TSQL', 

@command= N'INSERT INTO UPD_STATS_Events 

(AlertTime, Spid,DBName,TextData) 

VALUES (getdate(), $(ESCAPE_NONE(WMI(SPID))), 

''$(ESCAPE_SQUOTE(WMI(DatabaseName)))'', 

''$(ESCAPE_SQUOTE(WMI(TSQLCommand)))'')', 

@database_name=N'TestDB' ; 

GO 

-- Set the job server for the job to the current instance of SQL Server. 

EXEC msdb.dbo.sp_add_jobserver @job_name = N'Capture Update Stats Events' ; 

GO 

-- For the named instance you need to use \\.\root\Microsoft\SqlServer\ServerEvents\<INSTANCE NAME> 

EXEC msdb.dbo.sp_add_alert @name=N'Respond to Update Statistics Events', 

@wmi_namespace=N'\\.\root\Microsoft\SqlServer\ServerEvents\MSSQLSERVER', 

@wmi_query=N'SELECT * FROM UPDATE_STATISTICS', 

@job_name='Capture Update Statistics Events' ; 

GO

</script> 

You will have to give the SQL Agent account WMI Token replacement rights. You need to create a registry key in SQLAgent for this.
Add a Reg_DWORD HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL
Server\MSSQL.x\SQLServerAgent\AlertReplaceRuntimeTokens
and set it to 1.
and no SQLAgent restarts needed. 

Where x = the number of the SQL instance. This can be found out by looking under the regisry key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\Instance Names\SQL 

Whenever an update statistics task is fired against a database, the time when the command was executed, the SPID number, the database name and the T-SQL Command command associated with the event would be put into the table in the TestDB database. 

Service Broker should be enabled for the MSDB and the database in which you are storing the event details. 

Information about the namespaces can be found on MSDN. Also, you could use WMI Code Creator to browse through the namespace and the classes available and their properties. 

Furthermore, you can also use the WMI Event Watcher Task or the WMI Data Reader Task of SQL Server 2005 SSIS to perform the same.

The product has been patched with more recent updates

I have seen a few SQL Server 2005 Failover Clusters running into this issue. Recently, a KB Article was published explaining 2 methods (KB934749) to resolve this issue.

One scenario when you can run into this issue is when you have a SQL Server 2005 Failover Cluster patched with Service Pack 2 or higher and you add a new node to the Failover Cluster, you could run into the issue then.

Another scenario is that the SQL Server binaries on the one node got upgraded to a higher build but one of the member nodes was not upgraded due to some fatal error. But this is a very very rare scenario because SQL Server 2005 setup makes sure it patches all the member nodes where the SQL instance is not active before patching the node on which the SQL instance is installed.