Debugging that Assert condition: Maybe Not

Last week I had shown how to debug non-yielding scheduler and deadlocked schedulers memory dumps. In this post, I shall talk about Assertions. When an assert condition check fails in the SQL Server code base, a mini-dump of the SQL Server process is created which is found by default in the SQL Server LOG folder.

An assert is basically a predicate (true-false condition) put in a program’s code by the developer which he/she thinks should always evaluate to TRUE. If this fails, then a the assert failure code written by the developer will be executed.

This is one of those scenarios where you will NOT actually need to debug the assert dump and still be able to achieve a lot without opening a debugger. So if you were hoping for some more cool debugging steps, I will have to disappoint you till my next post! Whenever an assert condition fails, the message is logged in the SQL Server Errorlog along with the failing assert condition.

Let me walk you through an example. You would see messages similar to the one below in the Windows Application Event log when an Assertion check fails:

MSSQLSERVER    Error    (2)    3624    N/A    <server name> A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a QFE from Technical Support.
MSSQLSERVER    Error    (2)    17066    N/A    <server name> "SQL Server Assertion: File: <""logmgr.cpp"">, line=<line number> Failed Assertion = ‘!(minLSN.m_fSeqNo < lfcb->lfcb_fSeqNo)‘. This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted."

So if you are monitoring your application event logs, 17066 or 3624 is the error number that you need to monitor for. Once you are aware that an exception occurred, you can check the SQL Server Errorlog from the time period when the issue was reported and see what errors were reported. You will find the same verbiage reported in the SQL Errorlog as in the application logs. (See errorlog excerpt below)

Location: "logmgr.cpp" <line number>
Expression: !(minLSN.m_fSeqNo < lfcb->lfcb_fSeqNo)

While acting as a mirroring partner for database ‘<Database name>’, server instance ‘<database name>’ encountered error 3624, status 1, severity 20. Database mirroring will be suspended.  Try to resolve the error and resume mirroring.

Notice the additional errorlog entry reported above pertaining to Database Mirroring. I find that the database mirroring setup for an user database encountered an assertion which was reported earlier in the SQL Errorlog. The build number for the SQL Server instance in question is 5000 (SQL Server 2005). The first thing you should do is search for the assert expression [!(minLSN.m_fSeqNo < lfcb->lfcb_fSeqNo)] on Bing (Or any other search engine if you are really persistent on using a particular search engine 😉 ).

Additionally, as the error message points out, run a CHECKDB on the database in question and find out if there are any inconsistencies reported. If yes, then the first task is to restore the last known good backup or correct the inconsistencies before doing anything else.

Getting back to the search results, you will find that you get KB981273 and KB2403218 in the search results. Pay close attention to the symptoms section in the KB Article as fixes for assertion failures are very specific. If the symptoms do not match, then the KB Article you found might not apply to your situation.

Symptoms from KB981273:

In Microsoft SQL Server 2005, when you restart the SQL Server service, error messages that resemble the following are logged in the SQL Server Errorlog file.

Symptoms from KB2403218:

Consider the following scenario:

  • You create a database mirroring session between two instances of SQL Server 2005 and SQL Server 2008 and SQL Server 2008 R2 by using the High-Performance (asynchronous) mode. One instance is the principal SQL Server that contains the principal database, and the other instance is the mirror SQL Server that contains the mirror database.
  • A long delay occurs between the principal and mirror databases. For example, one of the following delays occurs:
    • The redo process of the mirror database is slower than the transfer of the transaction from the principal database to the mirror database.
    • A transaction that requires a long time runs on the principal database. For example, an ALTER INDEX REBUILD query runs on the principal database.

In this scenario, the session is suspended, and a mini-dump file is generated in the log folder on the mirror server.

As you can see that KB2403218 is what is applicable in this scenario where as the symptoms described in the first KB Article doesn’t match the issue! Next check the release vehicle for the fix for the SQL Server release that you are using and the releases affected by this issue. You will find that there are two builds for SQL Server 2005 which contain the fix for this issue:

2438344 Cumulative update package 13 for SQL Server 2005 Service Pack 3 (Build: 4315)
2489409 Cumulative update package 2 for SQL Server 2005 Service Pack 4 (Build: 5254)

The interesting point to note here is that the SQL Server version for the instance in question is above the build that had the fix for Cumulative Update #13 for SQL Server 2005 Service Pack 3 which had the issue. However, this fix was present for Service Pack 4 (Build: 5000) but was included for the Cumulative Update #2 for Service Pack 4. So, now you would need to either apply CU2 or the latest Cumulative Update package for Service Pack 4 to resolve this issue.

As always, if in doubt contact Microsoft CSS with assertion mini-dump, SQL Errorlogs and the Windows Event logs. CSS Engineers might request for additional data like profiler traces, filtered dump from the next occurrence or maybe even a repro if the available data is not sufficient to perform a complete root cause analysis and identify the problem.

To summarize:
1. Just because an assert condition matches a KB Article, it doesn’t mean that you have identified your issue.
2. Verify that the symptoms match the condition that you are facing as assert conditions might be common for two completely different issues.
3. Even though you might be on a higher build than the build version for a particular release vehicle, you still might not have the fix on the Service Pack level that you are on as the fix might have been included in a later cumulative update.
4. If you cannot find a KB Article match, don’t’ have any inconsistencies in the database and are able to reproduce the scenario consistently, then check if you can reproduce this on the latest available build for the SQL Server release that you are using or contact CSS.

Debugging is not always a necessity to investigate an issue!! Smile