Why you shouldn’t NUKE your transaction log file?

Posted on May 4, 2010 by Amit Banerjee

I have seen multiple scenarios where DBAs will delete the T-LOG file of a SQL Server database because it is eating up disk space.

THIS IS WRONG AND SHOULDN’T BE DONE UNDER ANY CIRCUMSTANCE!

The transaction log of a SQL Server maintains information about all logged activity that occurs in a database. During a database startup, a recovery phase is initiated which performs the Analysis, Redo and Undo phases to bring the database into a consistent state. The complete process is dependant on the availability of the database transaction log. Without this, you are being exposed to the loss of transaction consistency. Sounds like a fancy term doesn’t it? Well all it means is that you can now have data in your database tables which shouldn’t be there.

Let me elaborate on this with an example. I created a small test database which has a simple table containing two columns (record, amount). I inserted 1000 rows in the table. The T-SQL commands used are:

create database dbLogTest
go
use dbLogTest
go
create table tblTest (record int, amount bigint)
go
set nocount on
declare @cntr int
set @cntr = 1
while (@cntr <= 1000)
begin
insert into tblTest values (@cntr, @cntr * 100)
set @cntr = @cntr + 1
end

A SELECT of the first 9 records in the table looks like this:

Now I start a transaction in the database to update the first 10 records with an increment of 100. I do NOT commit the transaction and something happens to my SQL Server transaction log file. (Maybe someone deleted the file when SQL was offline!!) After I restart the SQL Server service and try to use the database, I get the following error in the SQL Server ERRORLOG:

2010-05-04 18:20:25.740 spid19s      Starting up database ‘dbLogTest’.
2010-05-04 18:20:26.300 spid19s      Error: 17207, Severity: 16, State: 1.
2010-05-04 18:20:26.300 spid19s      FileMgr::StartLogFiles: Operating system error 2(The system cannot find the file specified.) occurred while creating or opening file ‘D:\Database Files\dbLogTest_log.LDF’. Diagnose and correct the operating system error, and retry the operation.
2010-05-04 18:20:26.700 spid19s      File activation failure. The physical file name "D:\Database Files\dbLogTest_log.LDF" may be incorrect.
2010-05-04 18:20:26.700 spid19s      The log cannot be rebuilt because the database was not cleanly shut down.

Since the transaction log file is not available, I cannot start the database. Now I will put the database in EMERGENCY mode and run a CHECKDB on the database. The CHECKDB output states that the database has no consistency errors. So the data file is intact and there are no errors.

CHECKDB found 0 allocation errors and 0 consistency errors in database ‘dbLogTest’.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

You would now expect that the data in the data file is consistent and can be exported out and merged with your database backup (if one exists). Before we pass the verdict of “Everything is OK”, let’s have a look at the first 9 records in the tblTest table which we modified before killing the SQL Server service.

And now you have all values incremented by 100 even though you didn’t commit the transaction. This is because a CHECKPOINT on the database would have flushed all the dirty buffers to the database file. SQL Server follows WAL (Write Ahead Logging) for the Transaction Log. The database transaction log would have had the information that the database has an open transaction which was not committed and had modified data. If the transaction log file was present, the database recovery process would have rolled back the above changes and you would have data which is supposed to be present in the database. The transaction log is the heart of the transactional consistency of a SQL Server database. If your database was not cleanly shutdown, you will not be able to rebuild the transaction log with the ATTACH_REBUILD_LOG clause for CREATE DATABASE. The only option would be Export the data into another database with the same schema.

Whenever CSS is left with no other option other than to rebuild the transaction log to get the database online, we do explain in detail the ramifications of how rebuilding the transaction log can affect your transactional consistency. We never recommend rebuilding the transaction log. So, the best option is always to restore a last known database backup and reconcile the remaining portion of the data (since the last backup till date) manually. It is a manual operation but something that cannot be compromised on when you are handling databases in a financial institution or critical systems where a mishap like the one described above could prove disastrous.

A CREATE DATABASE with ATTACH_REBUILD_LOG for the database fails with the following error:

File activation failure. The physical file name "D:\Database Files\dbLogTest_log.LDF" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
Msg 1813, Level 16, State 2, Line 1
Could not open new database ‘dbLogTest’. CREATE DATABASE is aborted.

This post was an attempt in explaining how a transaction log file loss can affect you adversely. So, the takeaways here are:

1. Ensure that you have a tried and tested disaster recovery plan in place to recover from hardware/software failures that may occur in your environment

2. Never ever DELETE your transaction log file

Rebuilding the transaction log of a SQL Server database is one of the TOP 2 worst things that you can do as a DBA. The second one is running CHECKDB with REPAIR_ALLOW_DATA_LOSS. I have already documented about Repair Allow Data Loss in a previous post of mine.

Read more about these in Paul Randal’s posts on MSDN:

Ta da! Emergency mode repair

When should you rebuild the transaction log?

A snippet from the above post:

Product Support will recommend rebuilding the transaction log only if something has corrupted it and you have no backups (plus gently reminding you to get a backup strategy). Before they do so they’ll make sure you’re aware of the possibly consequences of doing it and advise you to run DBCC CHECKDB to determine whether the database is in a structurally and transitionally consistent state. If not, you’ll most likely need to run REPAIR_ALLOW_DATA_LOSS and then begin the process of working out what data was lost (and start dealing with irate customer?)

There is a no excuse for not having a backup strategy and there is definitely no excuse for running the above NOT recommended steps when you have other options.

I have found various MVPs stressing the same point. Here is one such example from Gail [Blog].

How to generate an Automatic Failover for a Mirrored Database

Posted on April 20, 2010 by Amit Banerjee

I have heard this question multiple times:

How do I generate a condition which will cause an Automatic Failover of a SQL Server Mirrored database?

First of all, you would need to setup mirroring under Full Safety mode with a Witness instance.

The only condition that triggers an automatic database failover would be if the Mirror and Witness instance are unable to connect to the Principal server instance.

Figure: Synchronized database mirroring session

As per the TechNet article, Automatic Failover would be under the following condition:

When safety is FULL, if neither the mirror nor the witness can see the principal, the mirror server can form a quorum with the witness and change its role to become a new principal server, assuming a the mirroring session was SYNCHRONIZED when the principal went away.

So, the condition above is self explanatory.

How can this be done?

1. If your Witness and Mirror are on different boxes or on the same box and the Principal instance is on a different box. Then:

Stop the Database Mirroring endpoint using the following command on the Principal instance: ALTER ENDPOINT <mirroring endpoint name> STATE = STOPPED. Note: This would put the database mirroring into a disconnected state. To bring database mirroring out of disconnected state, you need to start the endpoint using the following command: ALTER ENDPOINT <mirroring endpoint name> STATE = STARTED
Plug out the network cable for the Principal box provided the Principal and (Mirror & Witness) instances are on two different boxes.
Block all TCP traffic on the port that the server has established the Mirroring session.
Pause the Principal Instance service (Have not tried this one yet but theoretically it should work!)

2. Attach a Visual Studio Debugger or Windows Debugger to the Principal Server instance. Do not hit GO to continue the execution of the process being debugged.

THE ABOVE METHODS SHOULD ONLY BE TRIED ON A TEST ENVIRONMENT. THESE ARE NOT SOMETHING THAT SHOULD BE ATTEMTPED ON A PRODUCTION BOX.

This is just an informational article for simulating automatic database failover for testing any alert jobs or any other business logic that you need to kick off during an automatic failover.

What messages would be present in the SQL Server Errorlogs?

When the Automatic Failover is initiated, it would have the following messages printed in the SQL Server Errorlog of the Mirror instance:

2010-04-19 22:08:03.590 spid15s      Error: 1479, Severity: 16, State: 1.
2010-04-19 22:08:03.590 spid15s      The mirroring connection to “TCP://<FQDN>:5022” has timed out for database “dbMirror” after 10 seconds without a response. Check the service and network connections.
2010-04-19 22:08:03.590 spid15s      Database mirroring is inactive for database ‘dbMirror’. This is an informational message only. No user action is required.
2010-04-19 22:08:03.590 spid15s      The mirrored database “dbMirror” is changing roles from “MIRROR” to “PRINCIPAL” due to Auto Failover.
2010-04-19 22:08:03.720 spid15s      Recovery is writing a checkpoint in database ‘dbMirror’ (15). This is an informational message only. No user action is required.

The Witness instance would have the following messages in the Errorlog:

2010-04-19 22:08:03.360 spid25s Error: 1479, Severity: 16, State: 1.
2010-04-19 22:08:03.360 spid25s The mirroring connection to “TCP://<FQDN>:5022” has timed out for database “dbMirror” after 10 seconds without a response. Check the service and network connections.

On the Principal server, the system event logs might report that the SQL Server service is unresponsive based on the Service Timeout value configured on the machine (Default is 30 seconds):

Log Name:      System
Source:        Service Control Manager
Date:          4/19/2010 10:04:02 PM
Event ID:      7011
Task Category: None
Level:         Error
Keywords:      Classic
Description:
A timeout (30000 milliseconds) was reached while waiting for a transaction response from the MSSQLSERVER service.

After the Principal instance comes back online, the following messages would be printed in it’s Errorlog:

2010-04-19 22:28:19.190 spid26s      Error: 1474, Severity: 16, State: 1.
2010-04-19 22:28:19.190 spid26s      Database mirroring connection error 4 ‘An error occurred while receiving data: ’64(The specified network name is no longer available.)’.’ for ‘TCP://<FQDN>:5023’.
2010-04-19 22:28:19.200 spid27s      Error: 1474, Severity: 16, State: 1.
2010-04-19 22:28:19.200 spid27s      Database mirroring connection error 4 ‘An error occurred while receiving data: ’64(The specified network name is no longer available.)’.’ for ‘TCP://<FQDN>:5024’.
2010-04-19 22:28:19.360 spid20s      Database mirroring is inactive for database ‘dbMirror’. This is an informational message only. No user action is required.
2010-04-19 22:28:19.490 spid35s      Starting up database ‘dbMirror’.
2010-04-19 22:28:19.530 spid20s      The mirrored database “dbMirror” is changing roles from “PRINCIPAL” to “MIRROR” due to Role Syncronization.
2010-04-19 22:28:19.590 spid35s      Bypassing recovery for database ‘dbMirror’ because it is marked as a mirror database, which cannot be recovered. This is an informational message only. No user action is required.
2010-04-19 22:28:19.810 spid20s      Database mirroring is active with database ‘dbMirror’ as the mirror copy. This is an informational message only. No user action is required.
2010-04-19 22:28:45.440 spid20s      Database mirroring is inactive for database ‘dbMirror’. This is an informational message only. No user action is required.

If you notice above, you will see that even though the Principal was not available during the failover, it assumed the role of the Mirror after it came back online. This Role Synchronization is what keeps the Principal and the Mirror database status updated to prevent Split Brain scenarios.

References:

SQL Server Best Practices Article

Database Mirroring Best Practices and Performance

WMI Script for changing passwords of SQL Server 2008 and SQL Server 2012 instances

Posted on April 8, 2010 by Amit Banerjee

I had previously blogged on how to change the password of SQL Server 2005 and SQL Server 2000 instances using WMI [blog post]. I recently needed to change the passwords of my SQL Server 2008 instances for multiple instances. So, I thought it was high time I modified my script to change the passwords for SQL Server 2008 instances programmatically.

I just updated my script on Tech Net. Here is the link. For SQL Server 2008, I use the namespace:

\root\Microsoft\SqlServer\ComputerManagement10 (SQL Server 2008)

The SQL Server 2012 namespace is: \root\Microsoft\SqlServer\ComputerManagement11 (SQL Server 2012)

All the namespaces can detect lower version services as well till SQL Server 2005. Currently the script provides all the WMI namespaces available till date. You will need to modify the script or add checks in case a namepsace is not present.

If you are using SQL Server 2008 R2, then you need to have this fix applied so that SetServiceAccount can work:

2397020 FIX: “Access is denied. [0x80070005]” error message occurs in SSCM when you try to change the password of an account of SQL Server 2008 R2 in Windows Vista or in a later version of Windows http://support.microsoft.com/default.aspx?scid=kb;EN-US;2397020

If you use SetServiceAccount to change the password, then this will result in a SQL Server service restart. The script provided on the Technet Gallery is for changing both the service account and the password. If only the password needs to be changed for the service account, then use SetServiceAccountPassword. This will change the password and will not automatically restart your SQL Server service.

Clarifying misconception about RML Utilities

Posted on April 2, 2010 by Amit Banerjee

I remember in the past month addressing at least three different queries regarding RML Utilities on MSDN w.r.t. the co-relation between the Batches and Statements. Yes, it is true that a Batch will have one or more T-SQL statements. But if the statements do not have SP:StmtStarting and SP:StmtCompleted, then the Batch will not have an entry in the ReadTrace.tblStatements. This is documented in the RML Utilities help file.

You can verify if the trace has any SP:StmtStarting or SP:StmtCompleted using the following query:

 

select count(*) from fn_trace_gettable ('<trace file with full path>',null) where eventclass in (44,45)

From the help file:

ReadTrace.tblStatements

This table contains one row for each SP:StmtStarting or SP:StmtCompleted event in the trace. Similar to the ReadTrace.tblBatches table, the completed event is stored when it is available. Otherwise, the starting event is saved so that you can tell that a query was running when the trace was stopped.

Steps to script out the primary key constraints or clustered indexes in a database using DMO

Posted on March 24, 2010 by Amit Banerjee

Why do you need this? One of the drawbacks of the “Generate Scripts” option in SQL Server Management Studio is that SSMS scripts out the Primary Key constraints along with the CREATE TABLE script. So, if the table is already created, the primary key will not be created if you use the script that was generated by the SSMS Generate Scripts Wizard.

The steps that you need to follow are for generating the primary key constraints or clustered indexes for user defined tables:

1. You will have to create the Stored Procedure from my previous blog post on the source database.

2. Then enable the OLE AUTOMATION and XP_CMDSHELL on the server.

EXEC sp_configure 'Show Advanced',1

reconfigure with override

go

EXEC sp_configure 'Ole Automation',1

reconfigure with override

go

EXEC sp_configure 'xp_cmdshell',1

reconfigure with override

go

3. Use the following script to generate the list of Primary Keys for user defined tables in a database:

Script for generating primary key creation script:

set nocount on 

print 'DECLARE @return_value int ' 

select ' 

EXEC @return_value = [dbo].[proc_genscript] 

@ServerName = ''SQLSERVER NAME'', /*Replace this with the SQL server name*/ 

@DBName = ''Northwind'', /*Replace this with the database name*/ 

@ObjectName = '''+[name]+''', 

@ObjectType = ''KEY'', 

@TableName = ''' +object_name(parent_object_id)+''', 

@ScriptFile = ''C:\Database\'+[name]+'.sql'' /* Replace this with the directory path where you want the create scripts to be saved */ 

SELECT ''Return Value '' = @return_value'+char(13) 

from sys.objects 

where type = 'PK' 

and parent_object_id in (select id from sys.sysobjects where xtype = 'U')

Script for generating clustered indexes creation script:

set nocount on 

print 'DECLARE @return_value int ' 

select ' 

EXEC @return_value = [dbo].[proc_genscript] 

@ServerName = ''SQLSERVER NAME'', /*Replace this with the SQL server name*/ 

@DBName = ''Northwind'', /*Replace this with the database name*/ 

@ObjectName = '''+[name]+''', 

@ObjectType = ''INDEX'', @TableName = ''' +object_name(id)+''', 

@ScriptFile = ''C:\Database\'+[name]+'.sql'' /* Replace this with the directory path where you want the create scripts to be saved */ 

SELECT ''Return Value '' = @return_value'+char(13) 

from sys.objects 

where type = 'PK' 

and parent_object_id in (select id from sys.sysobjects where xtype = 'U')

4. Then use the scripts obtained from the above output to generate the Primary Key creation scripts in the folder that you mentioned. In the above example, the *.sql scripts would get created in the C:\Database folder.

5. Then use the following script to generate the SQLCMD commands for running those *.sql scripts against the source database.

create table #filenames (file_name varchar(1000))

insert into #filenames

exec xp_cmdshell 'dir <folder path>\*.sql /b'

select 'sqlcmd -S <servername> -d <databasename> -E -i ' +file_name

from #filenames

where file_name like '%sql%'

drop table #filenames

where

<server name> = SQL Server instance name

<database name> = Database Name on which you want the objects to be created

<folder path> = the folder where you want the .sql file to be created. This has to be the same folder path where you saved the .sql files in Step 3.

6. Now put the commands obtained in Step 5 into a .bat file saved at the same location where the .sql files were created in Step 5. Run the .BAT file from a command prompt window and all the script files will get executed against the context of the database that you provided.

The above set of steps loops through the sysindexes system catalog and picks up all non-fulltext index and generates the CREATE INDEX scripts for all the user database tables using DMO.

**** Before running this on a production database, please test out the above solution on a test database