CPU Drift Issues

Posted on December 30, 2009 by Amit Banerjee

I have seen a few cases where administrators have been concerned with CPU Drift and think that the SQL Server ERRORLOG reporting the following message is a serious cause for concern:

Error message 1
The time stamp counter of CPU on scheduler id 2 is not synchronized with other CPUs.
Error message 2
CPU time stamp frequency has changed from 191469 to 1794177 ticks per millisecond. The new frequency will be used

The SQL Server ERRORLOG reports a variety of informational, error and warning messages and not all messages are problems. This message is just telling you that CPU frequency between one or more processors is not synchronized. And how does this affect you??
Quoting from one of the below articles:

"Generally the Microsoft SQL Server support team considers drift less than several seconds, noise."

If you are concerned that the drift values are actually affecting your test results, then it would be a good idea to have the Speed Step, Power Now etc. features turned off during your testing phase. This would require changes at the BIOS level. Also, it would be a good idea to have consulted your H/W manufacturer and find out if there are any updates that require to be installed. Once again, I reiterate unless the drift values are constantly reporting several seconds for prolonged periods, only then do we have a Beginning of a problem, otherwise these warnings are mostly noise.
Additionally, trace flag (–T8033) can be used to suppress the drift warnings. However, please do not enable this trace flag on an instance of SQL Server 2005 unless and until, you fully understand the ramifications of ignoring the drift warnings.

Related Links
SQL Server timing values may be incorrect when you use utilities or technologies that change CPU frequencies
http://support.microsoft.com/kb/931279/en-us
SQL Server 2005 SP2 will introduce new messages to the error log related to timing activities
http://blogs.msdn.com/psssql/archive/2006/11/27/sql-server-2005-sp2-will-introduce-new-messages-to-the-error-log-related-to-timing-activities.aspx
SQL Server 2005 – RDTSC Truths and Myths Discussed
http://blogs.msdn.com/psssql/archive/2007/08/19/sql-server-2005-rdtsc-truths-and-myths-discussed.aspx

Finding out root cause for Cluster Failovers

Posted on December 30, 2009 by Amit Banerjee

We do get quite a few issues regarding root cause analysis for Cluster Failovers. Failovers mostly happen due to the IsAlive check failing for the SQL Server resource after which the following two conditions arise:
1. SQL Server service restarts on the same cluster node
2. SQL Server resource fails over to a member cluster node

So, for looking into the possible root causes of a cluster failover, a SQL version of the MPS Reports capture is required from the node on which SQL is currently active. From the data requested by the PSS Engineer, the following files would be of utmost importance:
1. All the SQL Server ERRORLOGs
2. Windows Event Logs (System/Application)
3. Cluster Log

Based on the SQL Server ERRORLOGs, we would check for any errors or tell-tale signs which would point us to why the IsAlive check failed for the SQL Server resource. After that, we would look into the cluster log and the windows event logs to find out co-relation among the events during the failover time on the server.

Since, the cluster log rolls over and also the SQL Server ERRORLOGs can roll over very quickly if a job is in place to recycle if after a certain size, it is a very good idea to save the cluster log and the SQL Server ERRORLOG(s) right after the failover to prevent them from rolling over and overwriting valuable data from the problem time period.

Sometimes, a post mortem analysis provides us a hypothesis of what happened but doesn’t paint the picture completely due to lack of data from the period the problem happened. Based on the nature of the problem, the PSS Engineer might ask to you to do the following for the the next problem occurrence along :
1. Capture a light-weight PSSDIAG round the clock with file rollover so that we can track what sort of events were happening on the SQL instance right before the failover.
2. Or a filtered dump of the SQL Process during the problem period if there is heavy blocking on the server or if the failover had occurred due to memory dump(s) on the server
3. OR a round the clock Perfmon log if there were possible external memory pressure on the server.

NOTE: Keep in mind that the cluster logs are always in GMT format. So you need to add/substract the time difference between your timezone and GMT when analyzing the cluster logs.

Allow Updates Option for SQL Server 2005

Posted on December 30, 2009 by Amit Banerjee

SQL Server 2005 doesn’t have the allow updates option. So, if you execute:

sp_configure ‘allow_updates’,1

and then if you execute reconfigure, you would get the following error:

Msg 5808, Level 16, State 1, Line 2
Ad hoc update to system catalogs is not supported.

After this all changes to the sp_configure settings followed by a reconfigure would yield this error. To rectify this, you will have to change the allow_updates option back to 0 and run reconfigure. As per SQL Server 2005 Books Online:

This option is still present in the sp_configure stored procedure, although its functionality is unavailable in Microsoft SQL Server 2005 (the setting has no effect). In SQL Server 2005, direct updates to the system tables are not supported.

Changing the allow updates option will cause the RECONFIGURE statement to fail. Changes to the allow updates option should be removed from all scripts.

So, if in case you use allow_updates in any script in SQL Server 2005, please refrain from doing so. Updates to the system catalogs are not permitted in SQL Server 2005 and any attempt/changes made to the System Resource database would get you into an unsupported scenario.