IO Frozen messages while taking NT Backup for SQL databases

I recently replied on a MSDN forum post where the question was regarding why the following message was showing up for SQL database files even though no SQL Server database file was being backed up by NTBackup.

Let’s assume that you have a folder called D:\Foo on your machine which you are backing up using NTBackup. On the same driver you have SQL Server database files residing in another folder which are not being backed up. When you start the NTBackup job to perform the backup, you will find the following messages in the Application Event Logs:

Event Type: Information
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 3197
Date:  1/2/2010
Time:  1:35:31 AM
User:  NT AUTHORITY\SYSTEM
Computer: <server name>
Description:
I/O is frozen on database msdb. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

The above message will be reported for each and every database on the instance which has files on the D:drive. This will be immediately followed by an equal number of messages for I/O resuming for the same set of databases.

Event Type: Information
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 3198
Date:  1/2/2010
Time:  1:35:31 AM
User:  NT AUTHORITY\SYSTEM
Computer: <server name>
Description:
I/O was resumed on database msdb. No user action is required.

The first question that would cross your mind is whether this goofy behavior is actually a bug. This is a by-design behavior of VSS (Volume Shadow) framework. NTBackup uses VSS to perform backups. When VSS is asked to perform a snapshot backup, it will ask the Writers registered with system to kick in which have files on that particular drive. Since, the drive contains SQL database files, the snapshot is created during which SQL Writer freezes IO for the SQL databases. After the snapshot is created, NTBackup finds that the database files are in the exclusion list (i.e. not being backed up) due to which the files do not reside in the physical snapshot backup file. From the SQL Server Errorlogs, you will be able to confirm this as the following message would be printed for each database whose file reside on the drive in question:

Backup Database backed up. Database: msdb, creation date(time): 2005/10/14(01:54:05), pages dumped: 1, first LSN: 319847:440:173, last LSN: 319848:24:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {‘{C0903B61-9239-4747-86C4-D4ADBED76428}3’}). This is an informational message only. No user action is required.

Notice that the number of pages dumped is 1 which means that the database was not actually backed up.

This behavior is documented under:

951288    SQL Server records a backup operation in the backupset history table when you use VSS to back up files on a volume
http://support.microsoft.com/default.aspx?scid=kb;EN-US;951288

This should NOT be a problem unless you are running into the issue mentioned in article below in which case you need to disable SQL Writer service:
937683 Error message when you try to restore a database by using SQL Server Management Studio in SQL Server 2005 after you use the Backup tool: “Restore failed for Server ‘<ServerName>’ (Microsoft.SqlServer.Smo)”
http://support.microsoft.com/default.aspx?scid=kb;EN-US;937683

This behavior will be exhibited for any application that uses VSS to backup files from a drive that has SQL Server database files. Since, SQL Database files are not like any other files on the filesystem, VSS invokes SQL Writer to perform the backups of SQL database files. If your SQL Writer service is experiencing issues, then your SQL database backups performed using VSS wouldn’t be trustworthy!

How to find out which object belongs to which filegroup

Recently I found the need to write a query which would tell me which table belonged to which File Group or Partition Scheme in a SQL Server 2005 database. I found that a system SP exists called “sp_objectfilegroup” to return the filegroup information for one table. This SP takes a parameter which is the object id of the table that you are interested in. However, using the sys.data_spaces catalog view and tying it back to sys.indexes output, I can find out which table belongs to which filegroup on the database. 

select object_name(a.object_id) as ObjectName, 

IndexName = case when a.name is null then 'Heap' else a.name end,(select name from sys.data_spaces where data_space_id = a.data_space_id) as FileGroupName 

from sys.indexes a where index_id < 2

 

I’m sure many people can find alternate ways to do so 🙂 but I thought of posting one in case I needed the same script in the future again. 

Why can I not produce the same plan with a copy of the production database?

This question has been asked multiple times for SQL Server 2005/2008 environments. Well, the answer is SQL Server has become a bit smarter when it comes to the Query Optimizer. Now it takes into account the hardware environment and database session state also. (Sometimes, the optimizer may be a bit too smart for it’s own good 🙂 but that is discussion not being addressed in this thread)

To determine an optimal query plan, SQL Server 2005 and 2008 uses the following information:

  1. The database metadata – The table statistics should hold the same information i.e. same of data distribution.
  2. The hardware environment – Is the Physical Memory (RAM) and the number of Processors identical to the letter on the two machines.
  3. The database session state

Typically, you must simulate all these same types of information if you want to reproduce the behavior of the query optimizer on a test system. If you are lucky, then without 2 & 3 being satisfied, you might land up with the same plan. In scenarios where you don’t, Option 2 & 3 would be a good option to simulate before running off to solve the question:

Why is Server A generating a better plan than Server B?

With the advent of Virtualization, the simulation of the physical memory and CPU processors is not that big a deal as before. Just thought I would answer this question because many time I have been asked to explain why the same database backup doesn’t produce the same plan as Server A. The first option is to always create statistics-only copy of your database and see if you can reproduce the plan that you see on your production server. If yes, then you can safely proceed to the next step of troubleshooting the bad plan i.e. find out the most expensive part of the plan and take necessary steps to tune it like adding covering indexes, defining statistics, re-placing the join order, adding query hints etc.

Very often SQL CSS team would require a statistics clone of your database, so that they can reproduce the issue in-house. This would not contain any data from the tables but a clone of the database metadata. So, in case you need to troubleshoot a performance issue where you suspect the query plan to be the culprit, you also use a statistics clone and use that on a test server to check if you reproduce the so-called “bad” plan. The reason I mention a test environment because sometimes it is not possible to troubleshoot a query performance issue on a production server. And generating a database statistics clone for a few tables is much faster than a backup restore of the entire database.

You can use the information mentioned in the article below to create a statistics clone of your database:

How to generate a script of the necessary database metadata to create a statistics-only database in SQL Server 2005 and in SQL Server 2008
http://support.microsoft.com/?kbid=914288

How to find out the missing indexes on a SQL Server 2008 or 2005 instance along with the CREATE INDEX commands

I had previously blogged on Missing Indexes and how useful a feature this is for troubleshooting query performance issues. Here is a script for identifying the missing indexes currently on your instance. However, you would definitely need to show prudence in implementing the indexes reported by these DMVs after a due round of testing. You would also need to make sure that nothing else is broken by implementing new indexes.  

 SELECT CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS improvement_measure, 'CREATE INDEX missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle) + ' ON ' + mid.statement + ' (' + ISNULL (mid.equality_columns,'') + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL (mid.inequality_columns, '') + ')' + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement, migs.*, mid.database_id, object_name(mid.[object_id],mid.database_id) as objectname 

FROM sys.dm_db_missing_index_groups mig 

INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle 

INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle 

WHERE CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10 

ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC 

How to purge MSDB history using T-SQL scripts

One of the Maintenance Plan Tasks available is the History Cleanup Task which can help you keep the MSDB database size in check. This Maintenance Plan Task uses the MSDB Stored Procedures sp_delete_backuphistory and sp_purge_jobhistory to cleanup the historical information inside the MSDB system tables.   

I have seen multiple scenarios where trying to delete all the historical data has proved to be fatal (because the log file grows disproportionately when the autogrow setting is set to percentage growth and due to large number of logged operations) . In such scenarios, you are left with two choices:   

1. Use the Maintenance Plan GUI to modify the oldest date for the History Cleanup Task for shorter durations and perform the cleanup by executing the job for shorter intervals.   

OR   

2. Use the script below to perform what the Cleanup Task would do.   

The script below would give a the list of commands to be executed. If you are concerned about the size of your MSDB transaction log file during the course of this purge operation, use the Checkpoint (if MSDB is in SIMPLE recovery which is the Default. Can’t think of a reason why someone would want it to be full.) piece which is currently commented in the below batch to keep the transaction log in size. Checkpoint operation under Simple Recovery model truncates the transaction log file.   

The script can be downloaded from TechNet also in case you face any formatting issues with the script above. Download Link: http://gallery.technet.microsoft.com/ScriptCenter/en-us/b3e2e276-b14a-406d-b67a-d8cbd9e1eabe   

SCRIPT   


-- Declare Local Variables

declare @mindate datetime, @oldest_date datetime, @sql varchar(8000), @currdate datetime, @oldestdate datetime -- Find out the oldest date from the Backup Set table

select @mindate = min(backup_start_date) from msdb..backupset

set @currdate = @mindate + 7

set @oldestdate = '2009-11-04 00:00:00.000' -- Modify this to the date till which you want your msdb history purged while

-- Begin a while loop to generate the commands to purge the MSDB entries

while (@currdate <= @oldestdate)

begin

set @sql = 'EXEC msdb..sp_delete_backuphistory ''' + cast(@currdate as varchar(20)) + ''''

print @sql

set @sql = 'EXEC msdb..sp_purge_jobhistory @oldest_date = ''' + cast(@currdate as varchar(20)) + ''''

print @sql

print char(13)

-- Optional if you are running out of space in MSDB

--print 'use msdb' + char(13) + 'checkpoint'

-- Increment value and move on to the next date

set @currdate = @currdate + 7 -- The time interval can be modified to suit your needs end

end

-- End of while loop

set @sql = 'EXEC msdb..sp_delete_backuphistory ''' + cast(@oldestdate as varchar(20)) + ''''

print @sql

set @sql = 'EXEC msdb..sp_purge_jobhistory @oldest_date = ''' + cast(@oldestdate as varchar(20)) + ''''

print @sql