SQL Server Backup Simulator v2 available now

SQL Server Backup Simulator is used by CSS to troubleshoot SQLVDI related issues and to identify if the SQLVDI DLL is functioning correctly. Based on the feedback received from the use of the tool and the current troubleshooting needs, we decided to do v2 release for SQL Server Backup Simulator.

The new features for the v2 release are:

  1. Log backup – Now the tool can perform log backups. The tool performs COPY_ONLY backups so that your LSN chain is not broken.
  2. Compression support – Starting from v2, the tool will allow you to take backups with compression enabled for SQL Server 2008 and higher.

The compression option drop-down list has three-options:
a. With Compression: This option will allow you to perform a backup with compression enabled even if the server default is not to use compression for backups.
b. No Compression: This option will allow you to perform a backup with compression disabled even if the server default is to use compression for backups.
c. Server Default: This option uses the server default setting (configuration setting: backup compression default) to perform the backup.


Screenshot of the v2 UI

Log Restore option is not currently available in this release. The incorporation of this feature will be evaluated while planning for the next release.

A big thank you to Sakthi [Blog | Twitter] for his assistance on the v2 release.

The latest release can be downloaded here.

Previous posts for Backup Simulator:
SQL Server Backup Simulator v1.0
SQL Server Backup Simulator v1.2


Cache files–Why are you increasing in size?

Management Data Warehouse provides a mechanism to collect Perfmon and DMV outputs and store them in a SQL Server database to help you troubleshoot performance issues that you might have faced in the past. This is quite helpful for intermittent performance issues that you might face on you SQL Server instances as it might not be possible to have someone monitoring the system 24X7 to troubleshoot the performance problem when it is occurring.

Management Data Warehouse provides two ways of collecting data:

1. Cached mode – This stores the Collection Set data in a .cache file in the cache file directory which is uploaded to the MDW database based on the schedule that you defined for your Collection Set.
2. Non-cached mode – This method of collection data using a Collection Set doesn’t depend on local cache. It collects the data pertaining to the Collection Set and uploads the data based on two different configuration options: On Demand or based on a schedule.


The screen shot (left) from Books Online of the run-time processing of the MDW jobs shows how the data is uploaded to the MDW database. When you have a highly transactional system with large number of concurrent connections to your SQL Server instance or a lot of processes running on the Windows Server, the amount of data collected by the System Data Collection Sets can be quite large for a single iteration especially for the Server Activity system data collection set.

So what can happen if the .cache files become very large. Let’s say your collection frequency is set to every 15 minutes. This means that the Data Collection set will collect data every 15 minutes. The scheduled upload will then use this cache file to upload the data using the SSIS upload package(s).

Now what happens if you miss an upload schedule because the upload job ran longer than 15 minutes. This means that your .cache file now has 2 sets of data to be uploaded. This cumulative addition of successive collection sets over a period of time with missed upload schedules can cause a problem once the upload job takes more than 60 minutes. The SSIS package(s) are hard-coded to timeout after 60 minutes of execution. Then you are stuck with cache data in your cache folder which cannot be uploaded.

When the SSIS package times out after 60 minutes of execution, you will get the following error in the job history for the job associated with the upload package:

Server Activity,Error,4861,Server Activity,,Error,The thread “ExecMasterPackage” has timed out 3600 seconds after being signaled to stop.,Upload,17/05/2011 13:11:37,,17/05/2011 14:11:38,3601,<server name>,,,

So what are your options here.

Option 1
1. Change the Server Activity collection to non-cached – This change would ensure that the data is uploaded as soon as it is collected without using an interim cache file.
2. Increase the collection frequency – The collection frequency is set to default of 15 seconds. This can be increased to 30 seconds or higher to prevent very large data being collected between the scheduled upload intervals.
3. Modify the sp_purge_data stored procedure for the purge job to run faster – An updated version of sp_purge_data is available here. This will ensure that the purge job completes faster.

Option 2
Evaluate the perfmon counters being collected in the Server Activity collection set. Find out which perfmon counters are required and then create a custom collection set to capture this data with a non-cached mode for data collection.

You can create your custom perfmon collection set as the default Server Activity collection set has a large number of perfmon counters from which data is collected and uploaded.

Below is a custom collection set created for Performance Monitor counters. I have shown how to collect Server specific perfmon counters and SQL instance specific perfmon counters.

use msdb;

Begin Transaction

Begin Try

Declare @collection_set_id_1 int

Declare @collection_set_uid_2 uniqueidentifier

EXEC [dbo].[sp_syscollector_create_collection_set]

@name=N'Perfmon Counter Collection Set',


@description=N'Collects Perfmon Counters ',






@collection_set_id=@collection_set_id_1 OUTPUT,

@collection_set_uid=@collection_set_uid_2 OUTPUT

Select @collection_set_id_1, @collection_set_uid_2

Declare @collector_type_uid_3 uniqueidentifier

Select @collector_type_uid_3 = collector_type_uid From [dbo].[syscollector_collector_types] Where name = N'Performance Counters Collector Type';

Declare @collection_item_id_4 int

EXEC [dbo].[sp_syscollector_create_collection_item]

@name=N'Logical Disk Collection and SQL Server CPU',

@parameters=N'<ns:PerformanceCountersCollector xmlns:ns="DataCollectorType">

<PerformanceCounters Objects="LogicalDisk" Counters="Avg. Disk Bytes/Read" Instances="*" />

<PerformanceCounters Objects="LogicalDisk" Counters="Avg. Disk Bytes/Write" Instances="*" />

<PerformanceCounters Objects="LogicalDisk" Counters="Avg. Disk sec/Read" Instances="*" />

<PerformanceCounters Objects="LogicalDisk" Counters="Avg. Disk sec/Write" Instances="*" />

<PerformanceCounters Objects="LogicalDisk" Counters="Disk Read Bytes/sec" Instances="*" />

<PerformanceCounters Objects="LogicalDisk" Counters="Disk Write Bytes/sec" Instances="*" />

<PerformanceCounters Objects="Process" Counters="% Privileged Time" Instances="*" />

<PerformanceCounters Objects="Process" Counters="% Processor Time" Instances="*" />

<PerformanceCounters Objects="Process" Counters="% User Time" Instances="*" />

<PerformanceCounters Objects="$(INSTANCE):Buffer Manager" Counters="Page life expectancy" Instances="*" />


@collection_item_id=@collection_item_id_4 OUTPUT,




select @collection_item_id_4

Commit Transaction;

End Try

Begin Catch

Rollback Transaction;

DECLARE @ErrorMessage NVARCHAR(4000);

DECLARE @ErrorSeverity INT;

DECLARE @ErrorState INT;

DECLARE @ErrorNumber INT;


DECLARE @ErrorProcedure NVARCHAR(200);


@ErrorSeverity = ERROR_SEVERITY(),

@ErrorState = ERROR_STATE(),

@ErrorNumber = ERROR_NUMBER(),

@ErrorMessage = ERROR_MESSAGE(),

@ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-');

RAISERROR (14684, @ErrorSeverity, 1 , @ErrorNumber, @ErrorSeverity, @ErrorState, @ErrorProcedure, @ErrorLine, @ErrorMessage);

End Catch;

How to: Create a Custom Collection Set That Uses the Generic T-SQL Query Collector Type
Using SQL Server 2008 Management Data Warehouse for database monitoring in my application

T-SQL Tuesday #19: Disasters and Recovery

This month’s revolving blog party a.k.a. T-SQL Tuesday is being hosted by Allen Kin (blog | twitter). The topic for this month is Disasters and Recovery. The creation of fail-safe mechanisms is probably one of the most important facet of any IT administrator role in today’s world where online transactions have become synonymous to daily lives. When creating a DR strategy, you need to keep three things in mind:

1. RPO Recovery Point Objective
2. RTORecovery Time Objective
3. SLA – Service Level Agreements

Mike Walsh has already documented about the above three points in an earlier post on SQL University DBA Week. So what am I going to talk about in this post….. Well, since a major part of my daily job involves working on critical situations which sometimes involve disaster recovery, I will talk about some key but simple points that you should be aware of while restoring your database[s] in case your server encounters a disaster.

1. Always have backups – This point no matter how many times reiterated is still less! You should always have backups of your databases. You should store your backups on a separate media which is not the same as the disk drives which store the database files. This ensures that you don’t have a single point of failure. I have seen a lot of scenarios where the backups and the database files are stored on the same drive. Once the drive goes BOOM!!… You are left with zilch!! A bad scenario to be in!

2. Test your backups – Just taking regular backups doesn’t ensure that you will be safe when a disaster strikes. You need to restore your backups and ensure that the backups can be restored successfully. If you have an automated DR strategy in place, then it is always good to perform dry-runs to ensure that your team is well versed with the recovery process when the need arises. You don’t want to be grappling with your restore scripts during a crisis situation. The next nugget of information is to ensure that a DBCC CHECKDB on the restored database completes without any errors. Just because the restore was successful, doesn’t mean that the database is consistent!

3. Know your environment – An application doesn’t just depend on your database[s]. There might be customized connection settings, connection aliases, specific logins, database users, linked servers etc. which need to be kept handy in case you need to bring a new environment online which was a clone of your previous disaster ridden system. I have seen multiple times where the databases have been restored successfully but the logins and linked specific to the application are missing. So now you have an environment which has the application databases but other specifics pertaining to the application’s functioning are missing.

4. System databases need to be backed up also – System databases do need to be backed up as well. Eg. Without the master database backup in a disaster scenario, you will be missing the necessary logins that your application needs to login to the user database.

5. Benchmarking is very important – As I mentioned earlier, a dry-run is very important. This is primarily due to the fact that if you do not know how much time a restore is going to take, you cannot define your RTO and adhere to your agreed SLAs. A classic situation is that the application needs to be up within 4 hours but since no once tested the entire restore cycle, no one knows how long it will take to restore the set of full/differential/log backups that are available.

6. Have multiple points of failure – This is mostly considered as a good to have but in critical environments, I consider this as a must-have! A simple implementation of this would be redundancy. Keep two copies of your database backups. If one set of database backups are inconsistent, you have a redundant set of backups to fall back on. A decision taken to disk space by reducing the number of redundant copies can look very daft when you are not able to bring a production system online due to the unavailability of consistent backups.

7. Never rely on REPAIR ALLOW DATA LOSS as your savior – The REPAIR ALLOW DATA LOSS option provided with CHECKDB should always and always be your last resort! This means that when all else fails, then you resort to repair options. This repair option should never be your first option for recovering from a disaster because as the name states it always results in data loss!!

8. Know how long a CHECKDB takes to complete on the database – If you do not run CHECKDB regularly on the database for which you are creating a DR strategy, then you are inviting trouble. Always run periodic CHECKDB on your databases and note the time taken so that you have a fair estimate on how long a CHECKDB should take to complete successfully on the given database.

9. Redundant database copies – A lot of environments use Database Mirroring, Log Shipping and Replication to maintain duplicate copies of the database. If you are using any of these features to maintain copies of the existing databases, then you need to note two things: first being the latency between the primary and secondary copies. This will define your RPO as the average latency will be the amount of data loss that you should be prepared to deal with and this will also define RPO to some measure as the time taken to recover the missing data would be defined by latency. Another point to keep in mind is that if you decide to use one of the alternate database copy of the database as the new production database, then you need to ensure that you avoid certain gotchas. Example: Orphaned users for SQL Authenticated logins when you use log shipping or database mirroring.

10. Keep in mind the additional SQL Server features being used – If you are using replication or mirroring or log shipping on the primary database being recovered, then you need to account for additional steps before restoring the databases as a simple restore of a database backup for such a database will not do. Eg. Special considerations need to be followed for restoring replicated databases.

For the non-technical aspects, a disaster recovery plan should include the following:

Disaster recovery plan types include the following (from Books Online):

  1. A list of people to be contacted if a disaster occurs
  2. Information about who owns the administration of the plan
  3. A checklist of required tasks for each recovery scenario. To help you review how disaster recovery progressed, initial each task as it is completed, and indicate the time when it finished on the checklist.

The above points might seem like basics but it would be surprising that they don’t get religiously followed on some production environments!