Deadlock Troubleshooting

Deadlock troubleshooting has always been a bit tricky. A deadlock is a special type of blocking scenario, but blocking and deadlocking are not the same thing. Deadlocks are a natural side effect of blocking. The most typical deadlock solution is either a stored proc/app code optimization, or a change in the database schema or indexing strategy.

To capture a deadlock graph in the SQL Server ERRORLOG, use the following trace flags:

  • For SQL Server 2000: 1204 & 3605
  • For SQL Server 2005: 1222 & 3605

The trace flags can be enabled using DBCC TRACEON(<trace flag #>,-1)

Refer Bart Duncan’s Weblog for deadlock troubleshooting. Bart was an Escalation Engineer for SQL Server and his blog article spanning three parts for deadlock troubleshooting is probably the most definitive guide for this topic.

http://blogs.msdn.com/bartd/archive/2006/09/09/747119.aspx

The -T1204 output can be interpreted using the following file: http://blogs.msdn.com/bartd/attachment/747119.ashx

If you find that your SQL Server is encountering deadlocks, then what data do you need to collect:

1. SQL Server Profiler traces (Refer my server side profiler traces post for setting up profiler traces) with the following events:

  • All statement level events
  • Showplan All events because the statement that is chosen as the deadlock victim will not have a Statistics Profile Event as it would have been terminated
  • Lock: Deadlock and if you are using SQL Server 2005, then capture Deadlock Graphs as well

2. Perfmon Logs

3. SQL 2000 Blocker Script output/SQL 2005 PerfStats

Additionally, if as a developer if you are concerned about your statements being terminated due to deadlocks, then you must build a retry logic into your code for handling the deadlock and resubmitting the query. This can be done using a try-catch block within the query batch that you are executing.

My Transaction Log is Growing!!!

The transaction log growth issue… What do we do?? A few things NOT TO do when this happens:
1. Delete the LDF file of the database
2. Create a new LDF file if there is no space on the disk drives and if the SQL Server ERRORLOG is reporting 900* messages. (Personal opinion.. Seen a lot of situations where this is has gone horribly wrong on production environments)
3. Take the database offline
4. Restart the SQL Server service

Believe me the above options are exercised some times on production scenarios.

Now that the above steps are avoided, what is to be done. Simple set of steps can be done to shrink the T-LOG of the database in question using the steps mentioned in the KB Article below. My troubleshooting steps involve the following:
1. Check the current SQL Server ERRORLOG for any 900* error messages related to the T-LOG
2. If you have enough disk space and the database is not in SIMPLE recovery, make a backup of the transaction log so that the you can most of the current active virtual log files
3. Use DBCC SQLPERF (LOGSPACE) to find out the current used percentage of the transaction log for the database in question
4. If it’s SQL Server 2005, find out the log reuse value in the sys.databases catalog view output. This will tell you why a database transaction log is not being reused.
5. Find out if there are any open transactions using DBCC OPENTRAN command. KILL the transaction if you can ascertain that this open transaction can be redone or recovered later.
6. Try shrinking the T-LOG using DBCC SHRINKFILE or if it is in SIMPLE recovery, try using BACKUP LOG WITH TRUNCATE_ONLY command. But be advised that this truncate command will truncate your transaction log. OR You can even try firing a CHECKPOINT into the database and then trying to truncate the transaction log. This works for only SQL Server 2000 instances.

** I know a lot of people frown on the shrink and truncate options but when you are strapped for time, this might be your ONLY option.

If you are truncating the log, then you need to take a Full Backup and then continue your log backups (if FULL/BULK LOGGED recovery model is used) since truncation breaks your log chain. You should always try to find out the RCA for the issue (if possible and if you get the chance) because you don’t want the same (as Bruce Willis says in Die Hard Smile) thing happen to you twice!!
Another important thing is to make sure that your autogrowth considerations are taken into account. The following article should help you decide what needs to done in deciding the autogrowth of a database file: http://support.microsoft.com/kb/315512/en-us

For more information, see the following topics in SQL Server Books Online:
• Shrinking the transaction log
• DBCC SHRINKFILE (Transact-SQL)
• Truncating the transaction log

Useful articles
How to use the DBCC SHRINKFILE statement to shrink the transaction log file in SQL Server 2005
http://support.microsoft.com/kb/907511/en-us
INF: Shrinking the Transaction Log in SQL Server 2000 with DBCC SHRINKFILE
http://support.microsoft.com/kb/272318/en-us
INF: How to Shrink the SQL Server 7.0 Transaction Log
http://support.microsoft.com/kb/256650/en-us
How to stop the transaction log of a SQL Server database from growing unexpectedly
http://support.microsoft.com/kb/873235/en-us
A transaction log grows unexpectedly or becomes full on a computer that is running SQL Server
http://support.microsoft.com/kb/317375/en-us\

Common Reasons for T-LOG growth
1. Wrong autogrowth values (Mostly high percentage values)

2. Index rebuild operations for database done infrequently with database in full recovery model or with log shipping/mirroring enabled

3. Something failing in your replication topology

4. BULK insert/update operations

5. Long running transactions

In case, the above mentioned steps and articles do not help you, please call Microsoft SQL Support.

SQL Server is not configured for Remote Connections

This is one of the most common error messages when users use default settings to connect to a SQL Server Express Instance.

ERROR
Sqlcmd: Error: Microsoft SQL Native Client: An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections.


Please refer the following article to make sure that your SQL Express Instance is configured to accept Remote Connections:
http://support.microsoft.com/kb/914277

Also, make sure from the SQL Server Configuration Manager that TCP/IP and Named Pipes are enabled and you have VIA protocol disabled.

Another thing you might want to check is if SQL Server Browser Service is up and running.

Database Mirroring with SQL Authentication

It’s quite fascinating that a database can be mirrored across two sites and a front-end application will not have a clue about a failure due to a failover partner string specified in the connection string of the application.

Database mirroring was a new feature introduced in SQL Server 2005 which has seen a sizeable amount of implementation over the years. What people have not noticed is how the application connects to the database on two completely different servers w.r.t. authentication. Most organizations use Windows Authentication and after a database failover, users are able to connect seamlessly. When SQL Authentication is used to connect to the mirrored database after a failover and this login only has logon rights on the mirrored database, then it becomes a different ballgame altogether.

SQL Server stores the information of a login using a SID value. The SID value for Windows Authentications (Domain Accounts) are the same throughout the environment hence a problem never arises here when the database is failed. But for SQL Login. Rule of thumb tells us that the login used to login to a mirrored database should exist on both instances which hosts the principal and mirror databases. So, what happens when you use a SQL Authentication after a failover?? Since, SQL Server relies on SIDs to map a database user to a SQL login. You can find the information under sys.database_users and sys.syslogins catalog views. Since, after a failover the SIDs for a SQL Authentication is bound to have a mismatch (as SQL Authentication SIDs are machine specific and not something derived from the AD as in the case of Domain Accounts), you will start getting login failed or login timeout period expired for SQL Authentication.

To resolve such an issue make sure that the SID of the SQL Authenticated user is the same on both the principal and mirror server instances.

So, if you have run into an issue of having orphaned logins after failing over your mirrored database, then you can follow the steps below to resolve the issue:

1. Use the sp_help_revlogin script to script out the SQL Login

2. Drop the user on the mirrored instance and recreated it using the script obtained above

3. You would have to failover and re-map the SQL login to the database user under user mappings because if you have a database user and drop the SQL login associated with it , then the database user becomes orphaned.

This is documented both in the whitepaper on Database Mirroring on Technet as well as Books Online which talks about resolving orphaned users issue after a database failover.

Related Article:

http://blogs.msdn.com/sqlserverfaq/archive/2009/04/13/orphaned-users-with-database-mirroring-and-log-shipping.aspx

Troubleshooting Log Shipping Issues

Log Shipping is a feature in SQL Server by which you can ship transaction log backups to a different server and restore the backups onto a standby database for disaster recovery or reporting purposes.

One of the major fundamental differences in SQL Server 2000 and SQL Server 2005 log shipping is that SQL Server 2005 uses linked servers to communicate between the primary and monitor and the secondary and monitor. The log shipping jobs are executed via linked server queries to update information about the backup, copy and restore jobs in case you have a remote monitor server.

So, if you find that your log shipping reports are defunct, then two additional things that you can do apart from the basic log shipping troubleshooting steps are:

1. Check if remote connections are enabled on the primary, secondary and monitor. For SQL Server 2000, check connectivity between the instances.

2. Check if a linked server using the same authentication settings as your log shipping setup can be setup between the primary and monitor or secondary or monitor depending on which part of your log shipping setup is broken. This is again applicable for SQL Server 2005.

Basic Log Shipping Troubleshooting Steps

1. Look into the SQL Server ERRORLOGs, Application/System Event Logs for any errors related to log shipping. Check the job history of the log shipping Backup/Copy/Restore jobs for any errors.

2. Check the Backup, Copy & Restore job history details for any errors.

3. If you are using SQL Server 2005, check what details are being displayed in the Log Shipping Report under Standard Reports in Management Studio.

4. If you want to take your troubleshooting to the next level, then you can even look into the log shipping meta data by querying the log shipping tables on the primary/secondary/monitor(if configured).

Addendum: April 26th, 2011

The log shipping configuration information can be found using the following methods:

1. Standard Reports – Transaction Log Shipping Status (right click Server Name in Management Studio-> Reports -> Standard Reports -> Transaction Log Shipping Status)
2. Use the Stored Procedure which is called by the above report: EXEC master..sp_help_log_shipping_monitor

Query to check log shipping job errors using the MSDB log shipping system tables


--List of Log Shipping jobs
SELECT * from dbo.sysjobs WHERE category_id = 6
SELECT * FROM [msdb].[dbo].[sysjobhistory]
where [message] like '%Operating system error%'
order by [run_date] , [run_time]

SELECT * FROM [msdb].[dbo].[log_shipping_monitor_error_detail]
where [message] like '%Operating system error%'

SELECT * FROM [msdb].[dbo].[restorehistory] 

Known issues with Log Shipping

1. You might find that the last backed up/copied/restored files do not reflect correctly in the log shipping reports when you use a remote monitor server. In such a scenario, check if the following issue documented in the blog post below is applicable in your case:
http://blogs.msdn.com/b/sqlserverfaq/archive/2009/03/27/transaction-log-shipping-status-report-for-monitor-server-will-not-pull-up-information-if-alias-is-used-for-monitor-server.aspx

The last copied and restored file will show up as null if the monitor instance is not on the same box as the secondary instance. The last backed up file will show up as null if the monitor instance is not on the same box as the primary instance if the select @@servername value is not used as the monitor server name while configuring the log shipping monitor.

2. If “REMOTE ACCESS” (sp_configure will show if it is enabled or not) is not enabled or the LOG SHIPPING LINKED SERVER (to the monitor server) is not working for the primary and secondary servers, then last backup file/last copy file/last restored file information will not get populated if a remote monitor server instance is being used. The easiest way to identify this issue would be to capture a profiler trace (on primary instance when the backup job is running and on the secondary instance when the copy/restore job is running). The profiler trace will report errors if an update operation pertaining to the log shipping monitor tables fails provided all “Errors and Warnings” profiler events are captured.

3. Another issue that you could run into while using Log Shipping is Orphaned Users if you have Database Users on the Primary Database mapped to SQL Authenticated Logins. This happens because the SIDs of the SQL Authenticated Users on the Primary and Secondary instance would be different. I documented the workaround to this issue in the following blog post: http://blogs.msdn.com/b/sqlserverfaq/archive/2009/04/13/orphaned-users-with-database-mirroring-and-log-shipping.aspx

4. When you are scripting out an existing log shipping configuration, ensure that you have Cumulative Update Package 9 applied for SQL Server 2005 Service Pack 2 applied for Management Studio. If that is already done, then use one of the options mentioned in the more information section in the KB Article below:
955693 FIX: In SQL Server 2005, the file information about the transaction log that was last copied and the file information about the transaction log that was last restored are missing
http://support.microsoft.com/default.aspx?scid=kb;EN-US;955693

5. If you have configured LogShipping with STANDBY mode on SQL Server 2008 and the destination folder for the TLOGS uses a remote server (on which the sqlservice/sqlagent is not a Local Admin), then the restore job will fail everytime with following error :

2008-12-12 14:44:58.53 *** Error: During startup of warm standby database ‘testdb’ (database ID 7), its standby file (‘<UNC path of the TUF file>’) was inaccessible to the RESTORE statement. The operating system error was ‘5(Access is denied.)’.

TUF = Transaction Undo File which is required for applying the next T-LOG backup. This issued is fixed in the cumulative update mentioned in the KB Article below:

FIX: Error message when you use log shipping in SQL Server 2008: “During startup of warm standby database ‘<Database Name>’ (database ID <N>), its standby file (‘<File Name>’) was inaccessible to the RESTORE statement”
http://support.microsoft.com/kb/962008

6. Log shipping restore will fail if there is a snapshot or an active DBCC replica on the secondary database on which the restore is being done.
https://troubleshootingsql.com/2012/09/12/why-did-the-restore-fail-on-the-log-shipped-secondary-database/

Addition: September 12, 2012

Special cases
In case you need to speed up the transaction log restore for your log shipping secondary database in standby mode, then follow the steps mentioned in this post.

In case you need to move your secondary log shipped database files to a new physical location, then you can use the steps mentioned in this post.