Differential Base and VSS Backups

VSS backups are a common way of taking SQL Server database backups using various backup utilities that are available today. However, one of the not commonly known facts is the differential backups that can be performed using VSS APIs. The SQL Server VSS Writer service is the one which facilitates the backups of SQL Server databases through VSS APIs.

The SQL writer supports differential Backup/Restore through two VSS differential mechanisms: Partial File and Differenced File by Last Modify Time.

  • Partial File The SQL writer uses the VSS Partial File mechanism for reporting changed byte ranges within its database files. 

  • Differenced File by Last Modify Time .   The SQL writer uses the VSS Differenced File by Last Modify Time mechanism for reporting changed files in full-text catalogs.

For more information, read the technical reference document on VSS backups and SQL Writer.

Once of the first points that you need to remember with VSS backups of your SQL Server databases is that a non-copy only VSS backup can break the differential chain. For more information on VSS Copy Only backups, read the blog post on the same topic.

Let me show that with an example. I will start with three full backups of my Adventureworks database in the following order:

1. Database Full backup using VSS backup
2. Database Full backup using native SQL backup
3. Database Full backup using VSS backup

When I look into the system catalogs, I find that the Differential Base LSN changes with each backup done. The differential base time is reported in GMT which is (+5:30 for my server’s time zone). The data shown below was retrieved from the sys.master_files output after each backup.

Differential Base LSN Differential Base Time
58000000764000000 5:09:54 PM
58000000767800000 5:11:55 PM
58000000770500000 5:12:21 PM

I validated the above entries with my SQL Server Errorlog as well:

2013-01-05 22:39:54.310 spid55       I/O is frozen on database AdventureWorks. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2013-01-05 22:39:54.330 spid55       I/O was resumed on database Adventureworks. No user action is required.
2013-01-05 22:39:54.340 Backup       Database backed up. Database: AdventureWorks, creation date(time): 2012/09/21(20:50:44), pages dumped: 21978, first LSN: 58:7640:65, last LSN: 58:7668:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {‘{24635C74-D8AD-475C-88AE-831D0F31AD79}’}). This is an informational message only. No user action is required.

2013-01-05 22:42:04.650 Backup       Database backed up. Database: adventureworks, creation date(time): 2012/09/21(20:50:44), pages dumped: 22027, first LSN: 58:7678:37, last LSN: 58:7695:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {‘<directory>\adw.bak’}). This is an informational message only. No user action is required.

2013-01-05 22:42:21.300 spid55       I/O is frozen on database AdventureWorks. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2013-01-05 22:42:21.310 spid55       I/O was resumed on database Adventureworks. No user action is required.
2013-01-05 22:42:21.310 Backup       Database backed up. Database: AdventureWorks, creation date(time): 2012/09/21(20:50:44), pages dumped: 21978, first LSN: 58:7705:37, last LSN: 58:7722:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {‘{17FAEB54-A411-4E16-BD96-FF5DE627CEDB}’}). This is an informational message only. No user action is required.

Now when I take a differential backup of the AdventureWorks database, the differential base guid is reported as 9EAAC560-B5AB-4DE6-A44B-A52E8D5BD82B and the differential base lsn is reported as 58000000770500037. This matches with the differential base guid that is reported by sys.master_files. However, this differential backup cannot be restored as the VSS backup taken was not a valid VSS backup.

The screenshot below shows the sequence of backups.

image

To summarize, if you have an application which takes VSS backups of volumes which contain SQL Server database files, then it could inadvertently invalidate your differential backups of your SQL Server databases even if you are not backing up SQL Server database files. If your VSS backup application does not use the COPY ONLY option, then the full backup of the database files taken by the snapshot backup of the volume will become your new differential base.

Reference:

SQL Server records a backup operation in the backupset history table when you use VSS to back up files on a volume
http://support.microsoft.com/kb/951288

The Ntbackup.exe operation may break the differential backup chain of a SQL Server database and may invalidate the differential backups when you perform a snapshot backup of the database
http://support.microsoft.com/kb/903643

Sysdatabases is empty–You say Whaaat?

While trying to backup a SQL Server database using SQL Writer (snapshot backups using VSS framework), then you might notice the following error in the application event logs.

Log Name: Application
Source: SQLWRITER
Event ID: 24581
Task Category: None
Level: Error
Description:
Sqllib error: System table sys.sysdatabases in SQL Server instance <SQL Server Name> is empty.

The above error is a bit misleading because it doesn’t literally mean that all your databases on the SQL Server instance reported have disappeared. If you scour the application event logs, you will find that another error is reported along with this error during the same timeframe.

Log Name: Application
Source: SQLWRITER
Event ID: 24583
Level: Error
Description:
Sqllib error: OLEDB Error encountered calling IDBInitialize::Initialize. hr = 0x80040e4d. SQLSTATE: 28000, Native Error: 18456
Error state: 1, Severity: 14
Source: Microsoft SQL Server Native Client 10.0
Error message: Login failed for user ‘NT AUTHORITY\SYSTEM’.
DBPROP_INIT_DATASOURCE: <SQL Server Name>
DBPROP_INIT_CATALOG: master
DBPROP_AUTH_INTEGRATED: SSPI

Now it is clear that the SYSTEM account is unable to log into the SQL Server instance mentioned in the error message above. If you look into the SQL Errorlog, you will find the following error message:

Error: 18456, Severity: 14, State: 11.
Login failed for user ‘NT AUTHORITY\SYSTEM’. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors. [CLIENT: <local machine>]

State 11 is basically telling you that the account doesn’t have access to the SQL Server instance. The reason for the login failure is available on SQL Server 2008 instances and above. There is a catch here the Login Failed error message is only reported in the SQL Errorlogs and Application Eventlogs only if the System account is granted access to the SQL Server instance but not granted the right permissions to get the database list from the sys.databases catalog. The hex code 0x80040e4d corresponds to an authentication failure.

So why does this happen? When you initiate a backup of a SQL Server database through SQLWriter (VSS Framework), SQLWriter will try to connect to all online instances to build a list of files associated with each database. SQLWriter/VSS needs this information to create the exclusion list. The account that the SQLWriter service uses is the NT AUTHORITY\SYSTEM account. So if you have multiple instances of SQL Server online on the same server, then the SYSTEM account needs to have SYSADMIN permissions OR the necessary permissions to run a query against the sys.databases catalog on the instance. This is a by-design requirement and documented in the KB Article mentioned below. However, if the SQL Server instance is not started, then you SQLWriter is not bothered about the instance(s) as I/O need not be frozen for database files for an instance which is shutdown.

919023    SQL Server 2005 connectivity and Volume Shadow Copy Service (VSS)
http://support.microsoft.com/default.aspx?scid=kb;EN-US;919023

Excerpt from above article:

Additionally, because of the types of operations that the writer must perform, we recommend that you do not remove the NT AUTHORITY\SYSTEM login from the sysadmin server role.

Now if you do not want to grant sysadmin permission to the System account, then you need to do the following to prevent the above error:

1. Grant db_datareader role to the SYSTEM account on the master database. This is required for other queries that the SQLWriter may need to executed during the course of the backup.
2. Grant ALTER ANY DATABASE or VIEW ANY DATABASE server-level permission, or CREATE DATABASE permission in the master database to the SYSTEM account. These are the minimum permissions required to query sys.databases table.

Normally the SYSTEM account is part of Built-in Administrators group on the box if it is not removed due to security hardening. You would need to grant the above permissions to the System account on all SQL Server instances that are installed and online on the server where you are trying to perform the VSS backup.

Another symptom of the problem manifests itself when you execute the command: vssadmin list writers. You will find that the above error message(s) are reported and the SQLWriter is not listed in the list of available writers.

One you have granted the SYSTEM account the necessary permissions, a quick way to verify that everything is working as expected is to run the vssadmin list writers command again. You might get the following output:

Writer name: ‘SqlServerWriter’
Writer Id: <GUID>
Writer Instance Id: <GUID>
State: [1] Stable
Last error: Non-retryable error

If you find that a non-retryable error is reported, then you need to restart the SQL Server VSS Writer service. Once this is done, run the command again and if there are no problems encountered, then you will get the following output for the SQLWriter.

Writer name: ‘SqlServerWriter’
Writer Id: <GUID>
Writer Instance Id: <GUID>
State: [1] Stable
Last error: No error

NOTE: If a SQL Server VSS (Snapshot) Backup is in progress for any instance on the server, then a restart of the SQL Server VSS Writer service should be deferred till the backup is completed or fails.

SQL Server Backup Simulator v2 available now

SQL Server Backup Simulator is used by CSS to troubleshoot SQLVDI related issues and to identify if the SQLVDI DLL is functioning correctly. Based on the feedback received from the use of the tool and the current troubleshooting needs, we decided to do v2 release for SQL Server Backup Simulator.

The new features for the v2 release are:

  1. Log backup – Now the tool can perform log backups. The tool performs COPY_ONLY backups so that your LSN chain is not broken.
  2. Compression support – Starting from v2, the tool will allow you to take backups with compression enabled for SQL Server 2008 and higher.

The compression option drop-down list has three-options:
a. With Compression: This option will allow you to perform a backup with compression enabled even if the server default is not to use compression for backups.
b. No Compression: This option will allow you to perform a backup with compression disabled even if the server default is to use compression for backups.
c. Server Default: This option uses the server default setting (configuration setting: backup compression default) to perform the backup.

image

Screenshot of the v2 UI

Log Restore option is not currently available in this release. The incorporation of this feature will be evaluated while planning for the next release.

A big thank you to Sakthi [Blog | Twitter] for his assistance on the v2 release.

The latest release can be downloaded here.

Previous posts for Backup Simulator:
SQL Server Backup Simulator v1.0
http://blogs.msdn.com/b/sqlserverfaq/archive/2010/10/27/sql-server-backup-simulator.aspx
SQL Server Backup Simulator v1.2
https://troubleshootingsql.com/2011/01/17/sql-server-backup-simulator-cumulative-update/

Troubleshooting that darn backup error

Backup and restore operations even though they are supposed to be a no-brainer, it can leave you wanting to tear your hair out. Considering the different elements that influence the outcome of a successful backup operation like backup media, consistency of the backup, network issues (if backing up to a UNC path), consistency of the database being backed up etc.

I shall attempt to outline a set of troubleshooting steps specifically for non-native backup/restore related errors for SQL Server databases.

Any BACKUP command that completes successfully or with errors has associated messages logged in the SQL Server Errorlog.

A successful backup from the SQL Server Errorlog:

Database backed up. Database: adventureworks, creation date(time): 2010/05/28(03:00:18), pages dumped: 17291, first LSN: 67:7733:56, last LSN: 67:7761:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {‘F:\adventureworks.bak’}).

You will see that the backup completion message has the database name, the number of pages that were contained in the backup, the date the backup was created and the number of devices and their type (in this case files). So you can reconstruct the sequence of backups even if your backup information history is missing from the MSDB system tables and you just have your SQL Server Errorlogs to play with. If you want to get real fancy about this, then you could spin up a nifty little Powershell or VBScript to parse through the SQL Errorlogs and provide the backup sequence to you as well.

Now, let’s look at a failed backup message from the SQL Errorlog.

Error: 18204, Severity: 16, State: 1.
BackupDiskFile::CreateMedia: Backup device ‘R:\adventureworks.bak’ failed to create.
Operating system error 3(The system cannot find the path specified.).
Error: 3041, Severity: 16, State: 1.
BACKUP failed to complete the command BACKUP DATABASE adventureworks.

You will notice above that the Operating System error code associated with the backup failure is reported in the error message. More often than not, the error message is self explanatory. If the error code doesn’t throw back an error text along with it, you can get the windows error code associated with the error code using net helpmsg <error number> from a command prompt window.

In the above case, there wasn’t any R: drive on my server. Now that I have finished stating the most obvious of troubleshooting methodologies for SQL native backups which was done to set the context for the next part of this post. The troubleshooting methodology for non-native backups.

Non-native SQL backups of databases use one of the following methods:

1. Use of APIs exposed through SQLVDI.dll to perform a VDI based backup
2. Use of VSS framework to perform a snapshot backup of the SQL Server database

Note: In this post, I shall not be addressing any storage level backup solutions that have options or features to handles SQL database backups.

This post is aimed at discussing the tackling of VSS/VDI related errors that you face while performing a database backup.

If a VDI backup fails, then you can try the following sequence of troubleshooting steps:

1. Attempt a backup to the same location using native SQL backup from a query window or SQLCMD. This might not always be possible as some of the backup tools do not allow any other backup application to connect to the backup share apart from the backup tool’s agents.
2. The next thing that you can do is to ensure that your SQLVDI.DLL is updated to the latest build. For this you can use the SQL Server Backup Simulator available on Code Gallery using the “Validate VDI Installation” option in the tool. This check in the tool will perform basic checks like current DLL version and additional checks based on root causes of common scenarios that CSS has seen in the past for VDI backup failures. Additional information on the usage of the tool is available here.
3. Perform a backup of the database using the tool for which the backup is failing (to the same destination if possible). The parameter tweaks at this point are not available in the current version like striped backups, changes to MAXTRANSFERSIZE/BUFFER COUNT etc. but if the backup from this tool is successful, then you know for a fact that the SQLVDI APIs are working as expected. Then it is either an additional piece of logic in the backup software that is failing or there is an external factors like network/backup media or there is a resource crunch on the SQL instance on which the database resides.


Common errors that you might see during a VDI backup operation failure:

Error message 1
BackupVirtualDeviceFile::ClearError: failure on backup device ‘VDI_ DeviceID ‘. Operating system error 995(The I/O operation has
been aborted because of either a thread exit or an application request.).

Error message 2
Error: 18210, Severity: 16, State: 1.
BackupMedium::ReportIoError: write failure on backup device ‘VDI_ DeviceID ‘. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.)

Error message 3
Error: 18210, Severity: 16, State: 1. 
BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device ‘VDI_ DeviceID. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.)

The above errors are mostly due to outdated versions of SQLVDI.DLL on SQL Server 2000 or SQL Server 2005 instances. The “Validate VDI Installation” option in the Backup Simulator should be able to identify any known issues with DLL version mismatch for you.


If you have an application that uses Volume Shadow Service to backup your SQL database and that backup fails, then you your troubleshooting steps would be a bit different. The tell-tale sign of a VSS Snapshot backup is in the I/O Frozen message being written into the SQL Errorlogs for the databases being backed up. In such a failure scenario, take a backup of the database using Windows Server Backup Admin (Windows Server 2008 and above) or NTBackup utility to perform a backup of the database files involved. If this backup is successful, it means that a snapshot backup is successfully completing for the database. Now depending on your backup application’s implementation logic, this might be a resource crunch or a backup storage media incompatibility or even a resource crunch.

Common errors associated with VSS backups:

Issue# 1
Error: 3041, Severity: 16, State: 1.
Backup      BACKUP failed to complete the command BACKUP DATABASE <database name>. Check the backup application log for detailed messages.
SubprocessMgr::EnqueueSubprocess: Limit on ‘Max worker threads’ reached.
This error message is raised when your backup application uses VSS to backup a large number of databases simultaneously. More information around this can be found in this post Volume Shadow barfs with 0x80040e14 code

Issue #2
While performing a Bare Metal backup you might run into issues if your SQL Server binaries are located on a different drive other than the C: drive. More information on the same is mentioned here: Location of SQL Binaries can flip out Bare Metal backups

Issue #3

Event Type:    Error
Event Source:    VSS
Event Category:    None
Event ID:    6005
Description:
Sqllib error: Database %1 is stored on multiple volumes, only some of which are being shadowed.

The above error occurs when you backup only one of the volumes that contains SQL database data/log files which are spread over multiple volumes. This is explained in detailed in this post: Why is VSS complaining about SQL Data files


More information on how VSS/VDI work is present in the blog post by Sudarshan: INFORMATIONAL- SHEDDING LIGHT on VSS & VDI Backups in SQL Server

Previous posts related to Volume Shadow Backups
IO Frozen messages while taking NT Backup for SQL databases

Previous posts related to VDI backups
SQL VDI backup fails with 0×80770007

Volume Shadow barfs with 0x80040e14 code

I was replying on a MSDN forum thread recently and found that the a VSS backup was failing due to large number of database volumes residing on a particular volume.

I re-created the issue on my box by limiting the number of Max Worker Threads to 128 on my SQL Server 2008 instance. After making the change to the number of worker threads, I created 99 databases on the D: drive of my server using the following script:

set nocount on
declare @cntr int = 1,@sql varchar(8000)
while @cntr &amp;lt; 100
begin
set @sql = 'CREATE DATABASE [db'+cast(@cntr as varchar(5))+'] ON&amp;nbsp; PRIMARY

( NAME = N' 'db'+cast(@cntr as varchar(5))+''', FILENAME = N' 'D:\DBFiles\db'+cast(@cntr as varchar(5))+'.mdf' ')
LOG ON
( NAME = N' 'db'+cast(@cntr as varchar(5))+'_log' ', FILENAME = N' 'D:\DBFiles\db'+cast(@cntr as varchar(5))+'.ldf'')'

set @cntr = @cntr + 1
exec(@sql)
end

I then attempted to backup the DBFiles folder on the D: drive using Windows Server Backup and it failed.

image

After that I looked into the SQL Server Errorlog and found the following errors:

2010-12-06 03:39:15.280 Backup       Error: 3041, Severity: 16, State: 1.2010-12-06 03:39:15.280 Backup       BACKUP failed to complete the command BACKUP DATABASE db68. Check the backup application log for detailed messages.

2010-12-06 03:39:15.370 spid53       Error: 18210, Severity: 16, State: 1.2010-12-06 03:39:15.370 spid53       BackupVirtualDeviceFile::PrepareToFreeze:  failure on backup device ‘{DB840D1B-DD17-4D73-83F1-7390F676DEE4}1’. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).

 

I had already documented earlier that the infamous 995 error message could be a problem with the sqlvdi.dll but in this case, it is not!

The next thing I did was to pick the errors from the Application Event logs and I found the following errors repeated multiple times:

Error    12/6/2010 3:39:15 AM    SQLVDI    1    None    SQLVDI: Loc=SignalAbort. Desc=Client initiates abort. ErrorCode=(0). Process=3920. Thread=11604. Client. Instance=SQL2008. VD=Global\{DB840D1B-DD17-4D73-83F1-7390F676DEE4}1_SQLVDIMemoryName_0.Error    12/6/2010 3:39:15 AM    SQLWRITER    24583    None    Sqllib error: OLEDB Error encountered calling ICommandText::Execute. hr = 0x80040e14. SQLSTATE: 42000, Native Error: 3013

Error state: 1, Severity: 16

Source: Microsoft SQL Server Native Client 10.0

Error message: BACKUP DATABASE is terminating abnormally.

SQLSTATE: 42000, Native Error: 3224

Error state: 1, Severity: 16

Source: Microsoft SQL Server Native Client 10.0

Error message: Cannot create worker thread.

If you look closely at the above error, then you will find that the following error message stands out like a sore thumb: “Cannot create worker thread”. After this issue occurs, you will find that the SQLWriter shows the following when you execute the command: vssadmin list writers from a command prompt window:

Writer name: ‘SqlServerWriter’   Writer Id: {a65faa63-5ea8-4ebc-9dbd-a0c4db26912a}

   Writer Instance Id: {9075f235-fdee-4445-851b-a69c54bd8b33}

State: [8] Failed

Last error: Non-retryable error

Ensure that you do not have any memory related errors in the SQL Server Errorlogs or any DBCC MEMORYSTATUS outputs printed in the Errorlog during the time of the backup failure. If there aren’t, then the above issue occurs due to the SQL instance hitting a worker thread limit.

The reason for this is explained by Rob Dorr (Principal SQL Server Escalation Engineer at Microsoft) in his blog post:

The controller and database worker come from the worker pool.  So if your max worker thread is configured for 255 the answer would be a max of 255 / 2 = 127 databases.    You probably won’t get to 127 with this configuration as you have system workers (lazy writer, checkpoint, log writer, and others) that already are using workers.

On my SQL instance, I have 120 database and I need about 240 worker threads to complete this operation. However, I have only 128 worker threads configured for my SQL instance. If you have Dedicated Administrator Connection enabled for the SQL instance, then you can this troubleshooting further and execute a select against the DMV sys.dm_os_schedulers (for SQL Server 2005 or above) and get an output of the following type:

status                 counts
———————- ——–

HIDDEN ONLINE          168

VISIBLE ONLINE         8

VISIBLE ONLINE (DAC)   1

Query used:

select status, count(*) as counts
from sys.dm_os_schedulers
group by status

As you will see that there are about 168 hidden online schedulers created for performing the Snapshot backups. But SQL instance in question doesn’t have sufficient work threads to facilitate this operation.

What can you do in such situations:

1. Increase the number of worker threads to account for the additional worker threads required for completing the VSS backups. This can be counter-productive as it can lead to non-Buffer Pool memory contention on 32-bit instances of SQL Server. The higher the number of max worker threads on your SQL Server instance, more the amount of non-BPool memory is consumed for the thread stack. This results in depletion of the available Buffer Pool memory.

2. Change the physical location of the SQL Server database files so that all the database files do not reside on the same volume.

3. Perform a VSS Backup with SQL Server VSS Writer service turned OFF. However, you cannot recover any SQL Server database files from such a Volume Shadow Backup.

4. Use SQL Native backups or other VDI backup tools (non-Snapshot) to perform SQL Server database backups.

Note: This also happens when you select any file on a drive which has a high number of SQL Server database files. This is because the SQL Server VSS Writer tries to enumerate the database files for the SQL Server instance that reside on the volume that has been selected for the Snapshot backup.