IO Frozen messages while taking NT Backup for SQL databases

I recently replied on a MSDN forum post where the question was regarding why the following message was showing up for SQL database files even though no SQL Server database file was being backed up by NTBackup.

Let’s assume that you have a folder called D:\Foo on your machine which you are backing up using NTBackup. On the same driver you have SQL Server database files residing in another folder which are not being backed up. When you start the NTBackup job to perform the backup, you will find the following messages in the Application Event Logs:

Event Type: Information
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 3197
Date:  1/2/2010
Time:  1:35:31 AM
User:  NT AUTHORITY\SYSTEM
Computer: <server name>
Description:
I/O is frozen on database msdb. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

The above message will be reported for each and every database on the instance which has files on the D:drive. This will be immediately followed by an equal number of messages for I/O resuming for the same set of databases.

Event Type: Information
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 3198
Date:  1/2/2010
Time:  1:35:31 AM
User:  NT AUTHORITY\SYSTEM
Computer: <server name>
Description:
I/O was resumed on database msdb. No user action is required.

The first question that would cross your mind is whether this goofy behavior is actually a bug. This is a by-design behavior of VSS (Volume Shadow) framework. NTBackup uses VSS to perform backups. When VSS is asked to perform a snapshot backup, it will ask the Writers registered with system to kick in which have files on that particular drive. Since, the drive contains SQL database files, the snapshot is created during which SQL Writer freezes IO for the SQL databases. After the snapshot is created, NTBackup finds that the database files are in the exclusion list (i.e. not being backed up) due to which the files do not reside in the physical snapshot backup file. From the SQL Server Errorlogs, you will be able to confirm this as the following message would be printed for each database whose file reside on the drive in question:

Backup Database backed up. Database: msdb, creation date(time): 2005/10/14(01:54:05), pages dumped: 1, first LSN: 319847:440:173, last LSN: 319848:24:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {‘{C0903B61-9239-4747-86C4-D4ADBED76428}3’}). This is an informational message only. No user action is required.

Notice that the number of pages dumped is 1 which means that the database was not actually backed up.

This behavior is documented under:

951288    SQL Server records a backup operation in the backupset history table when you use VSS to back up files on a volume
http://support.microsoft.com/default.aspx?scid=kb;EN-US;951288

This should NOT be a problem unless you are running into the issue mentioned in article below in which case you need to disable SQL Writer service:
937683 Error message when you try to restore a database by using SQL Server Management Studio in SQL Server 2005 after you use the Backup tool: “Restore failed for Server ‘<ServerName>’ (Microsoft.SqlServer.Smo)”
http://support.microsoft.com/default.aspx?scid=kb;EN-US;937683

This behavior will be exhibited for any application that uses VSS to backup files from a drive that has SQL Server database files. Since, SQL Database files are not like any other files on the filesystem, VSS invokes SQL Writer to perform the backups of SQL database files. If your SQL Writer service is experiencing issues, then your SQL database backups performed using VSS wouldn’t be trustworthy!

Restore operation with LiteSpeed fails with VDI error

While performing a backup/restore operation with LiteSpeed you get the following error:
 
SQLVDI: Loc=CVDS::Close. Desc=Abnormal termination state. ErrorCode=(0). Process=11812. Thread=15828. Client. Instance=PROD. VD=Global\VDI_759A94A8-F3A4-4689-AD2F-F1DDED76218B_0_SQLVDIMemoryName_0.
SQLVDI: Loc=SignalAbort. Desc=Client initiates abort. ErrorCode=(0). Process=11812. Thread=15828. Client. Instance=PROD. VD=Global\VDI_759A94A8-F3A4-4689-AD2F-F1DDED76218B_0_SQLVDIMemoryName_0.
 
When you use simple.exe to check if the SQLVDI.DLL is functioning correctly, you find that you can perform a backup and restore without any hassles. Now, you are left with a Catch-22 situation. Is it SQL Server or is it Litespeed? How do I get the backup to restore or the database backed up. If it is the latter, then you have another option, take a SQL Native Backup Wink.
 
So, the first clue is the highlighted portion in the above error message which is pointing to a Client Side VDI call stating that the Client initiated the Abort. Now why the client initiated the Abort depends on the how the code in the application is written to generate the Abort request.
 
If you enable logging on LiteSpeed, you might find something similar to the verbose output below:
 

8/15/2009 7:39:27 AM: RESTORE log [database name] FROM VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_0′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_1′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_2′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_3′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_4′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_5′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_6′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_7′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_8′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_9′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_10′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_11′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_12′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_13′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_14′ with blocksize=65536, stats=5, maxtransfersize=1048576, buffercount=30, NORECOVERY, NOUNLOAD, STATS = 10
8/15/2009 7:39:27 AM: VDI::ConfigureVDI -> Read registry value for configuration timeout
8/15/2009 7:39:27 AM: VDI::ConfigureVDI -> Waiting 4294967s for backup/restore command to begin
8/15/2009 7:39:27 AM: hr=80004005

8/15/2009 7:39:27 AM: RESTORE LOG is terminating abnormally.
RESTORE cannot process database ‘<database name>’ because it is in use by this session. It is recommended that the master database be used when performing this operation.
8/15/2009 7:39:27 AM: SLS::ExecSQL -> Error: 12309, Msg: SQL Server has returned a failure message to LiteSpeed 2005 which has prevented the operation from succeeding.
The following message is not a LiteSpeed 2005 message. Please refer to SQL Server books online or Microsoft technical support for a solution:
8/15/2009 7:39:27 AM: SLS::ExecSQL -> End Execute Command
8/15/2009 7:39:27 AM: SLS::ExecSQL -> Wait for VDI config thread
8/15/2009 7:39:47 AM: SLS::ExecSQL -> Forced close the VDI config thread
8/15/2009 7:39:47 AM: SLS::ExecSQL -> Original Msg: RESTORE LOG is terminating abnormally.
RESTORE cannot process database ‘<database name>’ because it is in use by this session. It is recommended that the master database be used when performing this operation.
8/15/2009 7:39:47 AM: SLS::ExecSQL -> Error: 12300, Msg: RESTORE LOG is terminating abnormally.
RESTORE cannot process database ‘<database name>’ because it is in use by this session. It is recommended that the master database be used when performing this operation.
8/15/2009 7:39:47 AM: Error: 12300, Msg: SLS::ExecSQL -> user exception
8/15/2009 7:39:47 AM: Error: 12300, Msg: SLS::execSQL -> End execute with unsuccessfull return code

The above set of messages can be misleading at times. This is a known issue with LiteSpeed when they call the client side version GetConfiguration function. This error happens if any login logged into the SQL instance has their default database set to the database that is currently being restored. This problem will manifest even if the SQL Service Account’s default database is the database being restored. The problem doesn’t occur on LiteSpeed version 4.8 to version 5.1. The problem will occur ONLY if the restore is performed from the LiteSpeed GUI or SSMS (by using LiteSpeed XPROCs). This problem will occur for any kind of restore operation (Full, Differential OR Log).

One possible workaround is to extract the backup files from the Litespeed backup set using the LiteSpeed Extractor utility which would extract the backup file into Microsoft SQL native backup format. After that these backups can be restored using T-SQL commands.

A second workaround is to start the restore operation using the command line option of LiteSpeed.

 

How to backup SQL Server databases to a mapped drive

While taking backups for SQL Server databases onto a mapped drive you might get the following error:

"The system cannot find the path specified."

This is because a network share that you map using a local drive letter will not be visible to a SQL Server instance as it is running as a service. So, the SQL Server service runs in the context of the local console context with the security context of the startup account of SQL Server. Also, mapped drives are specific to a session and not visible to a service started in the local console context.

So, if you want to backup a SQL Server database to a mapped drive using a local drive letter you have the following options:

1. Run the following command from a query window  EXEC xp_cmdshell ‘net use <drivename> <share name>’ — where <drive name>: Letter used to map the drive <share name>: UNC path to the share

2. After that you should be able to backup using the mapped drive letter

3. Your Management Studio Object Explorer should be able to list the above drive

Net use documentation:

http://technet.microsoft.com/en-us/library/bb490717.aspx

The drawback here is that once the SQL Server service is restarted, the mapped drive will no longer be visible because it will be unmapped. If you want to persist the mapped drive information then you need to create a startup procedure for executing the script in Step 1.

The easiest way would be to create a backup device using the UNC path of the remote share that you want to take the database backups on. One thing you need to keep in mind that the SQL Server startup account needs to have full permissions on the remote share.

The need for a sound Backup Strategy

Backup strategy needs to be designed in such a manner that it causes minimum amount of downtime in case during a disaster recovery scenario if backups need to be restored. The backup strategy depends on the following:
1. Size of the database – This would determine how often a FULL BACKUP can be taken
2. Recovery Model – This would determine if transaction log/differential backups are possible
3. The importance of the database – This would determined how often transaction log backups need to be taken if the database is in full recovery model.

If the database is in “simple” recovery model, then the only option that we would have is to take full backups. If the database is in FULL/BULK-LOGGED recovery models, then we have the option of taking transaction log/differential backups.

Now the next question is how often to take a transaction log backup. Depending of the importance of the database, transaction log backups can be taken every 10-15 minutes or even every hour. This needs to be determined at your end. Furthermore, if the transaction log backups are being taken very frequently and a full backup is taken once a week, then it is advisable to take differential backups during the middle of the week. This would ensure that the process of restore becomes less cumbersome as after the full backup, then differential backup can be applied followed by the transactional log backups.

In case of full/bulk-logged recovery models, the following backup strategy would be ideal:

1. Full backup

a. Differential backup(s)

i. Transaction log backup(s)

b. Differential backup(s)

i. Transaction log backup(s)

2. Full backup (and repeat the above cycle)

However, any backup strategy that you design needs to be compliant with your business needs.

Furthermore, for creating sound fail-over strategies for disaster recovery scenarios, the following can be considered:
1. Log shipping

2. Database Mirroring (For SQL 2005 SP1 and above)

3. Replication (Snapshot, Transactional, Merge)

4. Failover Clustering

5. Or an in-house disaster recovery mechanism envisioned by your IT & DBA team which could be a combination of one or more of the above methods.
Also, it is considered a good practice to restore a backup on a test server and a CHECKDB run on the restored database to make sure that the backup is in good shape. This becomes of utmost importance when it is not possible to run a CHECKDB before taking a backup. In SQL Server 2005, you have the option to perform a piece meal restore which would allow you to restore up to the page level in case you have database corruption.

The following articles could be helpful:

Overview of Restore and Recovery in SQL Server
http://msdn2.microsoft.com/en-us/library/ms191253.aspx

SQL Server 2005 provides the option of performing an online restore which is available for SQL Server 2005 Enterprise Edition. Furthermore, you have option of performing Piece Meal Restores i.e. page level restores, single file restore for a secondary data file in a filegroup while keeping the database online etc.

Please refer the following for more information:
Performing Online Restores
http://msdn2.microsoft.com/en-us/library/ms188671.aspx
Storage Top 10 Best Practices
http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/storage-top-10.mspx
Using Recovery Models
http://msdn2.microsoft.com/en-us/library/aa173678(SQL.80).aspx

SQL VDI backup fails with 0x80770007

While taking a SQL Server database non-native backup using an application that calls SQLVDI.DLL, you find that the backup fails following HEX code: 0x080770007.

0x80770007 (VD_E_INSTANCE_NAME VD_ERROR) translates to: Failed to recognize the SQL Server instance name.

Then check if the following condition holds true:
There is no DEFAULT instance of SQL Server on the machine where you are trying to take a VDI backup and the SQL instance that you are connecting to perform a backup is a named instance.

If above condition is true, then the issue is with CreateEx function of the interface IClientVirtualDeviceSet2. The CreateEx function is used to create the virtual device set and has the following syntax:

HRESULT IClientVirtualDeviceSet2::CreateEx (
LPCWSTR lpInstanceName,
LPCWSTR lpName,
VDConfig* pCfg
);

The "lpInstanceName" parameter identifies the SQL Server instance to which the SQL command needs to be sent to. If the CreateEx method has NULL as the first parameter, then it would always connect to the Default instance. If the server doesn’t have a default SQL instance, then the first parameter needs to be provided with the instance name:
Eg. If you have a named instance on the server as "SERVER1\SQLINST1”, then the first parameter for CreateEx should be "SQLINST1". A way to workaround the same for simple.cpp is mentioned here.

Technorati Tags: ,,