Change Tracking Cleanup


Cross post from SQL Server Tiger Blog.

Change tracking is a lightweight solution that provides an efficient change tracking mechanism for applications which was introduced in SQL Server 2008. We recently had a number of customers ask us about how Change Tracking Cleanup works and how they can troubleshoot further if the cleanup is not working as expected. In the first part of this blog post, I will explain how Change Tracking cleanup works and what “information” is cleaned up by the automatic cleanup task. I will also touch upon what enhancements were shipped in SQL Server 2014 and above to help cleanup more efficiently.

Change Tracking cleanup is invoked automatically every 30 minutes. The default retention period is 2 days. An example of setting the automatic cleanup for Change Tracking information is shown below.

ALTER DATABASE <DBNAME> SET CHANGE_TRACKING = ON  (CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)

Automatic Cleanup

Each table that is enabled for Change Tracking has an internal table (a.k.a. side table with the naming convention: change_tracking_<#>) which is used by Change Tracking functions to determine the change version and the rows that have changed since a particular version. Every time the automatic cleanup thread wakes up, it scans all the user databases on the SQL Server instance to identify the change tracking enabled databases. Based on the retention period setting of the database, each side table is purged of its expired records. The automatic cleanup removes rows from the on-disk tables based on the retention period defined for the database.

Change tracking information is stored for all tables (enabled for Change Tracking) in a database in an in-memory rowstore (syscommittable). This in-memory rowstore is flushed every checkpoint to the on-disk table (syscommittab). Rows from the syscommittab internal table are removed during every checkpoint.

Manual Cleanup

In SQL Server 2014 Service Pack 2 and above, we provided a new Stored Procedure, sp_flush_CT_internal_table_on_demand, to assist with Change Tracking cleanup. KB3173157 has more details. This stored procedure accepts a table name as parameter and will attempt to cleanup records from the corresponding change tracking internal table.  During the course of the deletion, it will print some verbose in the output window about the progress of deletion.

In case you want to automate the cleanup for all tables, you can use a while loop to execute this stored procedure against all the tables or tables that receive a high number of changes to prevent automatic cleanup from lagging in cleaning up records from the Change Tracking internal tables. A sample manual cleanup T-SQL script is available on the tigertoolbox GitHub repo: ChangeTrackingCleanup.sql (see screenshot below).

In the above section, I talked about how automatic and manual cleanup happens in SQL Server. In this post, we will explore a bit more in depth on how the cleanup actually works with the help of some metadata from an actual Change Tracking implementation.

ChangeTracking auto cleanup is a background thread which wakes up at a fixed frequency and purges expired records (records beyond retention period) from the change tracking side tables. There are two cleanup versions that this thread maintains over the course of the cleanup action – invalid cleanup version and hardened cleanup version. When the thread wakes up, it determines the invalid cleanup version. The invalid cleanup version is the change tracking version which marks the point till which the auto cleanup task will perform the cleanup for the side tables. The autocleanup thread traverses through the tables that are enabled for change tracking and calls an internal stored procedure in a while loop, deleting 5000 rows in a single call within the while loop. The loop is terminated only when all the expired records in the side table are removed. This delete query uses the syscommittab table (an in-memory rowstore ) to identify the transaction IDs that have a commit timestemp less than the invalid cleanup version. This process is repeated until the cleanup is done with all change tracking side tables for that particular database. Once this is done with the final change tracking side table, it updates the hardened cleanup version to the invalid cleanup version.

Every time a checkpoint is run, an internal procedure is called that uses the hardened cleanup version and deletes a minimum of 10k records from sys.syscommittab table after they are flushed to the disk-based side tables. As you can see, both cleanup (in-memory rowstore and disk based side tables) are inter-dependent and having an issue with one of these might affect the other cleanup, eventually leading to unnecessary records in sys.syscommittab and delays in CHANGETABLE functions. See screenshot below of an extended event session tracing the checkpoint of a database which shows operations on the sys.syscommittab table.

clip_image001

Below is the output of calling the stored procedure for manual cleanup stored procedure, “sp_flush_CT_internal_table_on_demand”. I had inserted 50K rows and random updates to three tables t2, t3 and t4. The change data was cleaned up by the automatic cleanup post the retention period. Post the cleanup, I inserted another 50K rows into the table t2. After that I executed the manual cleanup procedure which did not have any cleanup to perform as the change data was within the retention period.

 

Cleanup Watermark = 103016

Internal Change Tracking table name : change_tracking_885578193

Total rows deleted: 0.

— Query to fetch cleanup version for a change tracking table

select

object_name (object_id) as table_name,

is_track_columns_updated_on,

min_valid_version,

begin_version,

cleanup_version

from sys.change_tracking_tables

The screenshot of the SSMS output grid that you see below is from the query above.

clip_image002

A new extended event, “change_tracking_cleanup”, was added to track change tracking automatic cleanup activities. The T-SQL script used to fetch the information below can be found on our tigertoolbox github repository.

As you can see from the screenshot below, the cleanup task shows you when the cleanup started and completed. Additionally, you get granular details like when the retention timestamp was updated which is an easy way of co-relating the invalid cleanup version to a timestamp value (see UpdateRetention and UpdateInvalidCleanup steps below). The side table object IDs shown below have line items reflecting the number of rows cleaned up and the start and end of the change tracking cleanup. One aspect to keep in mind is that the update retention timestamp is reflected in UTC and you will need to do the necessary conversion to get the time aligned with the server’s local timezone.

clip_image003

To summarize, we suggest the following steps when troubleshooting change tracking cleanup issues:

  1. Ensure that auto cleanup is working properly using the Extended Event “change_tracking_cleanup”

  2. If automatic cleanup is running slowly, then you can execute the stored procedure “sp_flush_CT_internal_table_on_demand”. In SQL Server 2014 Service Pack 2 and above, we provided a new Stored Procedure, sp_flush_CT_internal_table_on_demand, to assist with Change Tracking cleanup. KB3173157 has more details.

Advertisements

Lessons learnt while using the Cloud Adapter


During the last week of August, I had blogged about how to get your on-premise database to your SQL Server instance running on an Azure virtual machine. I had run into a few issues while trying to run the wizard provided by Management Studio.

The First Stumble

This error is easy to circumvent and pretty much mentioned in the online documentation. The error message would read as:

Failed to locate a SQL Server of version 12.0.2000 or later installed on the remote machine. Please verify that a SQL Server of the same or higher version than the source SQL Server is installed on the remote machine.

The above error is self-explanatory. There is a requirement that the source database engine version be lower or equal to the version of the SQL Server instance running on Azure. Eg. You cannot deploy a database from a SQL Server 2014 instance to a SQL Server 2012 instance running on an Azure VM.

The Second Stumble

The second common error that you might run into is:

The Cloud Adapter port configuration is not valid. Verify the virtual machine endpoint configurations.

The above error will be encountered when the endpoint is not configured for the Azure virtual to accept connections from the outer realm! This can be easily rectified by adding a TCP endpoint to your Azure virtual machine for 11435 which is the port that the SQL Server Cloud Adapter Service is listening on. This is also mentioned in the online documentation. Once you have created the endpoint for your Azure virtual for your on-premise server to connect with the Cloud Adapter service, your endpoint configuration should look like the one in the screenshot below:

image

The Third Stumble

The next issue could be with permissions/authentication or it might not be as easy as it seems.

Cloud Adapter operation failed due to invalid authentication. Verify the virtual machine name, user name, and password.

So the first thing to check if you have the correct account name and password. If it is due to an authentication error, then the application event log of the Azure Virtual Machine will show the following error with the source as SQL Server Cloud Adapter service as shown in Screenshot 2. The text of the error message is mentioned below.

Access denied for user <user name>

image

The other error that you might encounter is when the SQL Server Cloud Adapter service tries to enumerate the database engines installed on the virtual machine. The error would still be talking about the authentication which is reported by the management studio wizard but a little investigation into the application event logs of the virtual machine will show the following error:

[Error] <ip address> Exception in GetSqlInstances(): SQL Server WMI provider is not available on <machine name>.. Stack trace:    at Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer.TryConnect()
   at Microsoft.SqlServer.Management.Smo.Wmi.WmiSmoObject.get_Proxy()
at Microsoft.SqlServer.Management.Smo.Wmi.WmiSmoObject.EnumChildren(String childTypeName, WmiCollectionBase coll)
at Microsoft.SqlServer.Management.Smo.Wmi.ServerInstanceCollection.InitializeChildCollection()
at Microsoft.SqlServer.Management.CloudAdapter.Tasks.GetSqlInstances()
at Microsoft.SqlServer.Management.CloudAdapter.CloudAdapter.GetSqlInstances(String username, String password). Inner Exception: Invalid namespace .

The above error clearly states that the GetSqlInstances() method failed. Microsoft.SqlServer.Management.Smo.Wmi namespace contains classes that provide programmatic access to the Windows Management Instrumentation (WMI) from an SMO application. I had talked about needing the shared management objects in an earlier post. The SQL Server 2014 WMI provider is also required which is available by installing the client connectivity components from any SQL Server 2014 setup including SQL Server Express. The components that I had installed were:

a. Client Tools Connectivity

b. Client Tools Backwards Compatibility

If you are not sure if you have the WMI provider, then look for the file “C:\Program Files (x86)\Microsoft SQL Server\120\Shared\sqlmgmproviderxpsp2up.mof“. This is the SQL Server 2014 MOF file. Another way to test if the WMI provider is working without running the wizard every time and have it fail is to run the PowerShell commands below on your Azure Virtual Machine. This script will tell you where the instance enumeration being performed by the deployment wizard will work or fail.

[System.reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo")
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SqlWmiManagement")
$m = New-Object ('Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer') '.'
foreach ($svi in $m.ServerInstances)
{
	$svi.Name;
}

This post was intended to document that common issues that you might run into while deploying a database from an on-premise SQL Server instance to a SQL Server instance running on an Azure Virtual Machine.

Sending a database to Azure


SQL Server 2014 Management Studio provides a wizard to deploy your on-premise database to a Windows Azure SQL Database server. This wizard can be used to export your on-premise database into a .bacpac file and upload it to your Azure SQL Database server. This wizard will help you to deploy your database to Windows Azure SQL Database. You may also use this wizard to deploy a Windows Azure SQL Database to a local instance of SQL Server, or to move a database from one instance of Windows Azure SQL Database to another. One advantage is that the wizard does a pre-validation check to determine if any unsupported object is present in the database which is not supported on Azure SQL Database.

This wizard can be used to deploy a database between an instance of the on-premise Database Engine and a Azure SQL Database server, or between two Azure SQL Database servers. An instance of the Database Engine must be running SQL Server 2005 Service Pack 4 (SP4) or later to work with the wizard. If a database on an instance of the Database Engine contains objects not supported on Azure SQL Database, you cannot use the wizard to deploy the database to Azure SQL Database. If a database on Azure SQL Database contains objects not supported by SQL Server, you cannot use the wizard to deploy the database to instances of SQL Server.

The first step is to launch the wizard. This is done by right clicking on a database and selecting Tasks and Deploy Database to Windows Azure SQL Database (see Screenshot 1).

image

In the Deployment Settings page (see Screenshot 2), you will need to provide the connection to your Azure SQL Database server which would be of the form <alpha numeric name>.database.windows.net. Before you are able to connect, you will need to an exception to the firewall from the Azure management portal to ensure that your on-premise machine is able to connect to your Azure SQL Database server. I had explained about this in a previous post.

image

Once you have connected to the database, you will need to specify the Azure SQL Database settings like the edition and maximum size. Ensure that you do this correctly because you are billed for your database usage on Azure.

Additionally, you will need to provide a temporary file name which will be the .bacpac file. In case you want to change the database name, you can choose to do so as well.

Once this is done, you are done with the Wizard and it will do it’s magic to export the database into a .bacpac file and import the same into an Azure SQL Database created with the same name as the one provided in the wizard.

The heavy lifting for all this activity is done by DLLs the present in C:\Program Files (x86)\Microsoft SQL Server\120\DAC\bin folder. If you want to automate this activity using command line operations, then one option is to use the SqlPackage utility which  is available in the same folder.

This can be done using the following commands:


"C:\Program Files (x86)\Microsoft SQL Server\120\DAC\bin\SqlPackage.exe" /Action:Export /SourceServerName:<server name> /SourceDatabaseName:AzureTest /TargetFile:"G:\Tempdb\Azure\AzureTest_SqlPackageExport.bacpac"

The command line output is shown below:

image

Now that we have the exported file, we need to send to the Azure SQL Database server. This can be done using the command below:


"C:\Program Files (x86)\Microsoft SQL Server\120\DAC\bin\SqlPackage.exe" /Action:Import /TargetServerName:<server name>.database.windows.net /TargetDatabaseName:AzureTest2 /TargetUser:troubleshootingsql /TargetPassword:<password> /SourceFile:"G:\Tempdb\Azure\AzureTest_SqlPackageExport.bacpac"

The output of the command is shown in the screenshot below which is similar to what you will see in the Management Studio UI:

image

To summarize the long post, we saw two ways of sending your on-premise database to an Azure SQL Database server using:

a. SQL Server Management Studio

b. SqlPackage utility

Reference:

Deploy a Database By Using a DAC
http://msdn.microsoft.com/en-us/library/jj554810.aspx