Introducing VDC_Complete for Backup and Restore applications using SQLVDI


Cross post from Tiger team blog.

In addition to its built-in functionality for backup and restore, SQL Server is supported by a large number of third-party backup solutions. SQL Server provides application programming interfaces (APIs) that enable independent software vendors to integrate SQL Server backup and restore operations into their products. These APIs are engineered to provide maximum reliability and performance, and support the full range of SQL Server backup and restore functionality, including the full range of hot and snapshot backup capabilities. In the current implementation of the SQL Server Virtual Backup Device Interface (VDI) protocol, the last message sent from SQL Server to the VDI client will be a VDC_Flush command. To prevent data loss, the VDI client must finish the backup before responding to the VDC_Flush command. There are certain situations like during backups of filestream enabled databases where a VDC_Flush command can be sent more than once during a backup operation. For certain backup applications, processing more than one VDC_Flush might be a challenge. If the VDI client responds to a VDC_Flush command without ensuring the backup is hardened when more data is coming after the VDC_Flush, SQL Server may truncate the transaction log. However, if the backup eventually fails on the VDI client, and the transaction log is also truncated, data loss might occur. If you don’t test your log backups at regular intervals, you wouldn’t figure out that you have a broken transaction log chain till the time you need to actually execute disaster recovery.

If you want to simulate a backup for your SQL Server instance, then you use the SQL Server Backup Simulator which is available on our tigertoolbox GitHub repository. The updated SQLVDI header files required to use VDC_Complete is available on the Microsoft SQL Server Samples GitHub repository.

Improvement

A new change was introduced in SQL Server 2012, SQL Server 2014 and SQL Server 2016 to allow backup and restore applications to know when SQL Server has completed sending the data to the client (VDI) so that it can perform the necessary end of backup tasks. KB3188454 has details about the change. This update adds a new VDI command VDC_Complete that indicates SQL Server has completed sending data to the VDI client. Therefore, the VDI client will be able to finish the backup before it sends response to SQL Server. This functionality allows the VDI client to fail the backup in case something goes wrong, and also prevents the transaction log being truncated without hardening the log backup by the client application.

The improvement was designed keeping backward compatibility in mind since backup applications can target multiple releases and versions of SQL Server at the same time. There can be four different scenarios which are outlined in the table below.

SQL Server Instance (VDI Server) Backup Application (VDI Client) Behavior
Supports VDC_Complete Supports VDC_Complete Client has to request VDF_RequestComplete while fetching the configuration to let the server know that it understands VDC_Complete. Once the server sends back a confirmation using the VDI configuration that it supports VDC_Complete, the client needs to execute the appropriate code path to handle VDC_Complete
Supports VDC_Complete Does not support VDC_Complete Since client does not request VDF_RequestComplete while fetching the configuration, server proceeds using previous behavior to maintain backward compatibility
Does not support VDC_Complete Supports VDC_Complete Server will return a NULL response because it does not support VDC_Complete for the requested feature VDF_RequestComplete
Does not support VDC_Complete Does not support VDC_Complete Behaves with legacy behavior of using only VDC_Flush

VDC_Complete is available for both scenarios backup and restore. If you want to use VDC_Complete for a database restore, then that is possible as well. If you choose to do so, then you will need to negotiate (as shown in the sample below) the use of VDC_Complete before the restore while fetching the VDI configuration.

Sample Code

Let us now look at the code changes required on the client side application which will help backup application work

I am going to use references from the sample simple.cpp file available in “SQL Server Virtual Backup Device Interface (VDI) Specification”. The download location is available in the references listed at the end of this post.

A handshake was implemented for the server and client to negotiate if VDC_Complete is supported by either. This can be done by the client requesting for the VDF_RequestComplete configuration. When the server receives this feature request, it will know that the client understands VDC_Complete and will respond accordingly indicating that it supports VDC_Complete.

      // Setup the VDI configuration we want to use.

      memset (&config, 0, sizeof(config));

      config.deviceCount = 1;

 

    // Request for VDC_Complete feature from the server

    config.features = VDF_RequestComplete;

Once the client receives the configuration, it needs to check the features available (see below) by determining if VDF_CompleteEnabled is set. Once the client determines that the server supports VDC_Complete, it can execute the code path which does the appropriate processing (end of backup book keeping, closing the backup etc.) after it receives the VDC_Complete message.

    hr = vds->GetConfiguration (10000, &config);

      

    if (!SUCCEEDED (hr))

    {

             printf_s (“\nError: VDS::Getconfig fails: 0x%X\n”, hr);

        if (hr == VD_E_TIMEOUT)

        {

                    printf_s(“\nError: Failed to retrieve VDI configuration due to timeout value (10,000 ms).\n”);

        }

        goto shutdown;

    }

      

    // Determine if the server supports VDC_Complete based on configuration parameters returned

if (!(config.features & VDF_CompleteEnabled))

       {

             printf_s(“\nServer does not support VDC_Complete.”);  

       }

       else

       {

             printf_s(“\nServer supports VDC_Complete.”);

       }

      

When the backup application receives a VDC_Complete, the backup application will need to harden the backup and complete book keeping tasks before it acknowledges success for the VDC_Complete message (see below). This will ensure that SQL Server does not advance the LSN without the client application hardening the backup which could lead to a potential data loss situation.

case VDC_Complete:

// Ensure that book keeping is completed.

printf_s(“\n\nSQL Server has signaled the end of the operation.”);

// Harden the backup and close the file

       completionCode = ERROR_SUCCESS;

       break;

Reference

How It Works: SQL Server Backup Buffer Exchange (a VDI Focus)

SQL Server Virtual Backup Device Interface (VDI) Specification

SQL Server Backup Simulator

Updated SQLVDI Header files required for VDC_Complete

Advertisements

Change Tracking Cleanup


Cross post from SQL Server Tiger Blog.

Change tracking is a lightweight solution that provides an efficient change tracking mechanism for applications which was introduced in SQL Server 2008. We recently had a number of customers ask us about how Change Tracking Cleanup works and how they can troubleshoot further if the cleanup is not working as expected. In the first part of this blog post, I will explain how Change Tracking cleanup works and what “information” is cleaned up by the automatic cleanup task. I will also touch upon what enhancements were shipped in SQL Server 2014 and above to help cleanup more efficiently.

Change Tracking cleanup is invoked automatically every 30 minutes. The default retention period is 2 days. An example of setting the automatic cleanup for Change Tracking information is shown below.

ALTER DATABASE <DBNAME> SET CHANGE_TRACKING = ON  (CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)

Automatic Cleanup

Each table that is enabled for Change Tracking has an internal table (a.k.a. side table with the naming convention: change_tracking_<#>) which is used by Change Tracking functions to determine the change version and the rows that have changed since a particular version. Every time the automatic cleanup thread wakes up, it scans all the user databases on the SQL Server instance to identify the change tracking enabled databases. Based on the retention period setting of the database, each side table is purged of its expired records. The automatic cleanup removes rows from the on-disk tables based on the retention period defined for the database.

Change tracking information is stored for all tables (enabled for Change Tracking) in a database in an in-memory rowstore (syscommittable). This in-memory rowstore is flushed every checkpoint to the on-disk table (syscommittab). Rows from the syscommittab internal table are removed during every checkpoint.

Manual Cleanup

In SQL Server 2014 Service Pack 2 and above, we provided a new Stored Procedure, sp_flush_CT_internal_table_on_demand, to assist with Change Tracking cleanup. KB3173157 has more details. This stored procedure accepts a table name as parameter and will attempt to cleanup records from the corresponding change tracking internal table.  During the course of the deletion, it will print some verbose in the output window about the progress of deletion.

In case you want to automate the cleanup for all tables, you can use a while loop to execute this stored procedure against all the tables or tables that receive a high number of changes to prevent automatic cleanup from lagging in cleaning up records from the Change Tracking internal tables. A sample manual cleanup T-SQL script is available on the tigertoolbox GitHub repo: ChangeTrackingCleanup.sql (see screenshot below).

In the above section, I talked about how automatic and manual cleanup happens in SQL Server. In this post, we will explore a bit more in depth on how the cleanup actually works with the help of some metadata from an actual Change Tracking implementation.

ChangeTracking auto cleanup is a background thread which wakes up at a fixed frequency and purges expired records (records beyond retention period) from the change tracking side tables. There are two cleanup versions that this thread maintains over the course of the cleanup action – invalid cleanup version and hardened cleanup version. When the thread wakes up, it determines the invalid cleanup version. The invalid cleanup version is the change tracking version which marks the point till which the auto cleanup task will perform the cleanup for the side tables. The autocleanup thread traverses through the tables that are enabled for change tracking and calls an internal stored procedure in a while loop, deleting 5000 rows in a single call within the while loop. The loop is terminated only when all the expired records in the side table are removed. This delete query uses the syscommittab table (an in-memory rowstore ) to identify the transaction IDs that have a commit timestemp less than the invalid cleanup version. This process is repeated until the cleanup is done with all change tracking side tables for that particular database. Once this is done with the final change tracking side table, it updates the hardened cleanup version to the invalid cleanup version.

Every time a checkpoint is run, an internal procedure is called that uses the hardened cleanup version and deletes a minimum of 10k records from sys.syscommittab table after they are flushed to the disk-based side tables. As you can see, both cleanup (in-memory rowstore and disk based side tables) are inter-dependent and having an issue with one of these might affect the other cleanup, eventually leading to unnecessary records in sys.syscommittab and delays in CHANGETABLE functions. See screenshot below of an extended event session tracing the checkpoint of a database which shows operations on the sys.syscommittab table.

clip_image001

Below is the output of calling the stored procedure for manual cleanup stored procedure, “sp_flush_CT_internal_table_on_demand”. I had inserted 50K rows and random updates to three tables t2, t3 and t4. The change data was cleaned up by the automatic cleanup post the retention period. Post the cleanup, I inserted another 50K rows into the table t2. After that I executed the manual cleanup procedure which did not have any cleanup to perform as the change data was within the retention period.

 

Cleanup Watermark = 103016

Internal Change Tracking table name : change_tracking_885578193

Total rows deleted: 0.

— Query to fetch cleanup version for a change tracking table

select

object_name (object_id) as table_name,

is_track_columns_updated_on,

min_valid_version,

begin_version,

cleanup_version

from sys.change_tracking_tables

The screenshot of the SSMS output grid that you see below is from the query above.

clip_image002

A new extended event, “change_tracking_cleanup”, was added to track change tracking automatic cleanup activities. The T-SQL script used to fetch the information below can be found on our tigertoolbox github repository.

As you can see from the screenshot below, the cleanup task shows you when the cleanup started and completed. Additionally, you get granular details like when the retention timestamp was updated which is an easy way of co-relating the invalid cleanup version to a timestamp value (see UpdateRetention and UpdateInvalidCleanup steps below). The side table object IDs shown below have line items reflecting the number of rows cleaned up and the start and end of the change tracking cleanup. One aspect to keep in mind is that the update retention timestamp is reflected in UTC and you will need to do the necessary conversion to get the time aligned with the server’s local timezone.

clip_image003

To summarize, we suggest the following steps when troubleshooting change tracking cleanup issues:

  1. Ensure that auto cleanup is working properly using the Extended Event “change_tracking_cleanup”

  2. If automatic cleanup is running slowly, then you can execute the stored procedure “sp_flush_CT_internal_table_on_demand”. In SQL Server 2014 Service Pack 2 and above, we provided a new Stored Procedure, sp_flush_CT_internal_table_on_demand, to assist with Change Tracking cleanup. KB3173157 has more details.

SQL Server Backup Simulator Download Location


I have received a number on questions about the download location of the SQL Server Backup Simulator. I am writing this post to let you know that the v2.0 version of the SQL Server Backup Simulator is now available on the tigertoolbox GitHub repository under Releases.

We are also working on an updated version which will account for the latest VDC_Complete enhancements.