Troubleshooting Log Shipping Issues

Log Shipping is a feature in SQL Server by which you can ship transaction log backups to a different server and restore the backups onto a standby database for disaster recovery or reporting purposes.

One of the major fundamental differences in SQL Server 2000 and SQL Server 2005 log shipping is that SQL Server 2005 uses linked servers to communicate between the primary and monitor and the secondary and monitor. The log shipping jobs are executed via linked server queries to update information about the backup, copy and restore jobs in case you have a remote monitor server.

So, if you find that your log shipping reports are defunct, then two additional things that you can do apart from the basic log shipping troubleshooting steps are:

1. Check if remote connections are enabled on the primary, secondary and monitor. For SQL Server 2000, check connectivity between the instances.

2. Check if a linked server using the same authentication settings as your log shipping setup can be setup between the primary and monitor or secondary or monitor depending on which part of your log shipping setup is broken. This is again applicable for SQL Server 2005.

Basic Log Shipping Troubleshooting Steps

1. Look into the SQL Server ERRORLOGs, Application/System Event Logs for any errors related to log shipping. Check the job history of the log shipping Backup/Copy/Restore jobs for any errors.

2. Check the Backup, Copy & Restore job history details for any errors.

3. If you are using SQL Server 2005, check what details are being displayed in the Log Shipping Report under Standard Reports in Management Studio.

4. If you want to take your troubleshooting to the next level, then you can even look into the log shipping meta data by querying the log shipping tables on the primary/secondary/monitor(if configured).

Addendum: April 26th, 2011

The log shipping configuration information can be found using the following methods:

1. Standard Reports – Transaction Log Shipping Status (right click Server Name in Management Studio-> Reports -> Standard Reports -> Transaction Log Shipping Status)
2. Use the Stored Procedure which is called by the above report: EXEC master..sp_help_log_shipping_monitor

Query to check log shipping job errors using the MSDB log shipping system tables


--List of Log Shipping jobs
SELECT * from dbo.sysjobs WHERE category_id = 6
SELECT * FROM [msdb].[dbo].[sysjobhistory]
where [message] like '%Operating system error%'
order by [run_date] , [run_time]

SELECT * FROM [msdb].[dbo].[log_shipping_monitor_error_detail]
where [message] like '%Operating system error%'

SELECT * FROM [msdb].[dbo].[restorehistory] 

Known issues with Log Shipping

1. You might find that the last backed up/copied/restored files do not reflect correctly in the log shipping reports when you use a remote monitor server. In such a scenario, check if the following issue documented in the blog post below is applicable in your case:
http://blogs.msdn.com/b/sqlserverfaq/archive/2009/03/27/transaction-log-shipping-status-report-for-monitor-server-will-not-pull-up-information-if-alias-is-used-for-monitor-server.aspx

The last copied and restored file will show up as null if the monitor instance is not on the same box as the secondary instance. The last backed up file will show up as null if the monitor instance is not on the same box as the primary instance if the select @@servername value is not used as the monitor server name while configuring the log shipping monitor.

2. If “REMOTE ACCESS” (sp_configure will show if it is enabled or not) is not enabled or the LOG SHIPPING LINKED SERVER (to the monitor server) is not working for the primary and secondary servers, then last backup file/last copy file/last restored file information will not get populated if a remote monitor server instance is being used. The easiest way to identify this issue would be to capture a profiler trace (on primary instance when the backup job is running and on the secondary instance when the copy/restore job is running). The profiler trace will report errors if an update operation pertaining to the log shipping monitor tables fails provided all “Errors and Warnings” profiler events are captured.

3. Another issue that you could run into while using Log Shipping is Orphaned Users if you have Database Users on the Primary Database mapped to SQL Authenticated Logins. This happens because the SIDs of the SQL Authenticated Users on the Primary and Secondary instance would be different. I documented the workaround to this issue in the following blog post: http://blogs.msdn.com/b/sqlserverfaq/archive/2009/04/13/orphaned-users-with-database-mirroring-and-log-shipping.aspx

4. When you are scripting out an existing log shipping configuration, ensure that you have Cumulative Update Package 9 applied for SQL Server 2005 Service Pack 2 applied for Management Studio. If that is already done, then use one of the options mentioned in the more information section in the KB Article below:
955693 FIX: In SQL Server 2005, the file information about the transaction log that was last copied and the file information about the transaction log that was last restored are missing
http://support.microsoft.com/default.aspx?scid=kb;EN-US;955693

5. If you have configured LogShipping with STANDBY mode on SQL Server 2008 and the destination folder for the TLOGS uses a remote server (on which the sqlservice/sqlagent is not a Local Admin), then the restore job will fail everytime with following error :

2008-12-12 14:44:58.53 *** Error: During startup of warm standby database ‘testdb’ (database ID 7), its standby file (‘<UNC path of the TUF file>’) was inaccessible to the RESTORE statement. The operating system error was ‘5(Access is denied.)’.

TUF = Transaction Undo File which is required for applying the next T-LOG backup. This issued is fixed in the cumulative update mentioned in the KB Article below:

FIX: Error message when you use log shipping in SQL Server 2008: “During startup of warm standby database ‘<Database Name>’ (database ID <N>), its standby file (‘<File Name>’) was inaccessible to the RESTORE statement”
http://support.microsoft.com/kb/962008

6. Log shipping restore will fail if there is a snapshot or an active DBCC replica on the secondary database on which the restore is being done.
https://troubleshootingsql.com/2012/09/12/why-did-the-restore-fail-on-the-log-shipped-secondary-database/

Addition: September 12, 2012

Special cases
In case you need to speed up the transaction log restore for your log shipping secondary database in standby mode, then follow the steps mentioned in this post.

In case you need to move your secondary log shipped database files to a new physical location, then you can use the steps mentioned in this post.

How to connect to SQL Server 2005 Embedded Edition

From SQL Server Management Studio, you cannot connect to a SQL Server Embedded Edition instance by just providing the server name. You need to put the following in the server name input:

\\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query

The files for an embedded edition instance should be at:
C:\WINDOWS\SYSMSI\SSEE\MSSQL.2005\MSSQL\

I have seen multiple people baffled by this question. Hence, I thought I would put up a post about it.

Using WMI to Manage SQL Server 2000 Services

There are quite a few blog posts out there to manage SQL Server 2005 services using the :\\.\root\Microsoft\SqlServer\ComputerManagement
However, WMI Admin Provider is not pre-installed for SQL Server 2000. It needs to be installed separately using the WMI Admin Provider Setup available along with the SQL Server 2000 Setup CD under x86\other\wmi folder. 

Sample script to change SQL Server 2005 service startup account and password using WMI:
http://blogs.msdn.com/mwories/archive/2006/11/03/wmi_5F00_change_5F00_password.aspx
MSDN Documentation on Win32_Service class
http://msdn.microsoft.com/en-us/library/aa394418.aspx 

Sample Script to change a SQL Server 2000 instance startup account using root\MicrosoftSQLServer namespace: 

strComputer = "." 

Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\MicrosoftSQLServer") 

' Obtain an instance of the the class 

' using a key property value. 

Set objShare = objWMIService.Get("Win32_Service.Name='MSSQL$SQL2000'") 

' Obtain an InParameters object specific 

' to the method. 

Set objInParam = objShare.Methods_("Change"). _ inParameters.SpawnInstance_() 

' Add the input parameters. 

objInParam.Properties_.Item("StartName") = "LocalSystem" 

objInParam.Properties_.Item("StartPassword") = "" 

' Execute the method and obtain the return status. 

' The OutParameters object in objOutParams 

' is created by the provider. 

Set objOutParams = objWMIService.ExecMethod("Win32_Service.Name='MSSQL$SQL2000'", "Change", objInParam) 

' List OutParams 

Wscript.Echo "Out Parameters: "Wscript.echo "ReturnValue: " & objOutParams.ReturnValue

CPU Drift Issues

I have seen a few cases where administrators have been concerned with CPU Drift and think that the SQL Server ERRORLOG reporting the following message is a serious cause for concern:

Error message 1
The time stamp counter of CPU on scheduler id 2 is not synchronized with other CPUs.
Error message 2

CPU time stamp frequency has changed from 191469 to 1794177 ticks per millisecond. The new frequency will be used

The SQL Server ERRORLOG reports a variety of informational, error and warning messages and not all messages are problems. This message is just telling you that CPU frequency between one or more processors is not synchronized. And how does this affect you??
Quoting from one of the below articles:

"Generally the Microsoft SQL Server support team considers drift less than several seconds, noise."

If you are concerned that the drift values are actually affecting your test results, then it would be a good idea to have the Speed Step, Power Now etc. features turned off during your testing phase. This would require changes at the BIOS level. Also, it would be a good idea to have consulted your H/W manufacturer and find out if there are any updates that require to be installed. Once again, I reiterate unless the drift values are constantly reporting several seconds for prolonged periods, only then do we have a Beginning of a problem, otherwise these warnings are mostly noise.
Additionally, trace flag (–T8033) can be used to suppress the drift warnings. However, please do not enable this trace flag on an instance of SQL Server 2005 unless and until, you fully understand the ramifications of ignoring the drift warnings.

Related Links
SQL Server timing values may be incorrect when you use utilities or technologies that change CPU frequencies
http://support.microsoft.com/kb/931279/en-us
SQL Server 2005 SP2 will introduce new messages to the error log related to timing activities
http://blogs.msdn.com/psssql/archive/2006/11/27/sql-server-2005-sp2-will-introduce-new-messages-to-the-error-log-related-to-timing-activities.aspx
SQL Server 2005 – RDTSC Truths and Myths Discussed
http://blogs.msdn.com/psssql/archive/2007/08/19/sql-server-2005-rdtsc-truths-and-myths-discussed.aspx

Finding out root cause for Cluster Failovers

We do get quite a few issues regarding root cause analysis for Cluster Failovers. Failovers mostly happen due to the IsAlive check failing for the SQL Server resource after which the following two conditions arise:
1. SQL Server service restarts on the same cluster node
2. SQL Server resource fails over to a member cluster node

So, for looking into the possible root causes of a cluster failover, a SQL version of the MPS Reports capture is required from the node on which SQL is currently active. From the data requested by the PSS Engineer, the following files would be of utmost importance:
1. All the SQL Server ERRORLOGs
2. Windows Event Logs (System/Application)
3. Cluster Log

Based on the SQL Server ERRORLOGs, we would check for any errors or tell-tale signs which would point us to why the IsAlive check failed for the SQL Server resource. After that, we would look into the cluster log and the windows event logs to find out co-relation among the events during the failover time on the server.

Since, the cluster log rolls over and also the SQL Server ERRORLOGs can roll over very quickly if a job is in place to recycle if after a certain size, it is a very good idea to save the cluster log and the SQL Server ERRORLOG(s) right after the failover to prevent them from rolling over and overwriting valuable data from the problem time period.

Sometimes, a post mortem analysis provides us a hypothesis of what happened but doesn’t paint the picture completely due to lack of data from the period the problem happened. Based on the nature of the problem, the PSS Engineer might ask to you to do the following for the the next problem occurrence along :
1. Capture a light-weight PSSDIAG round the clock with file rollover so that we can track what sort of events were happening on the SQL instance right before the failover.
2. Or a filtered dump of the SQL Process during the problem period if there is heavy blocking on the server or if the failover had occurred due to memory dump(s) on the server
3. OR a round the clock Perfmon log if there were possible external memory pressure on the server.

NOTE: Keep in mind that the cluster logs are always in GMT format. So you need to add/substract the time difference between your timezone and GMT when analyzing the cluster logs.