SQL Saturday 613: Building 1 million predictions per second with R-services and SQL Server 2016


image

Last Saturday, I presented a session on how to use R-Services with SQL Server to build an analytical workflow for banking solutions. I talked about how our customer, Jack Henry & Associates, an S&P 400 company that supports more than 11,300 financial institutions with core processing services, is leveraging the power of SQL Server and R to make drive intelligent insights into their data warehousing software. Below you will find a link on how you can setup the complete solution that you can deploy on our Data Science Virtual Machine on Azure.

Our Corporate Vice President, Joseph Sirosh, had demonstrated this solution along with Jack Henry & Associates at Ignite. In this session, I talked about the nuts and bolts on how to build a scalable predictive engine with SQL Server and using the enhancements shipped in SQL Server 2016. After this session, you will be able to build your very own scalable predictive engine on SQL Server 2016!

As always, it’s always great to meet my friends and the community at SQL Saturday events!

The slide deck used for my presentation can be found on SlideShare. The PowerBI dashboard and the demo scripts can be downloaded from the tigertoolbox repo on GitHub.

Advertisements

Book on Azure and SQL Server


image

My last contribution to a book was in 2012. With the advent of the cloud and my continuing work with SQL Server, I jumped at the opportunity when my friends and colleagues, Pranab Mazumdar [t] and Sourabh Agarwal [t], talked to me about contributing to a book on running SQL Server on Azure.

The book “Pro SQL Server on Microsoft Azure” attempts to teach the basics of Microsoft Azure and see how SQL Server on Azure VMs (Infrastructure-as-a-Service) and Azure SQL Databases (Platform-as-a-Service) work. This book will show you how to deploy, operate, and maintain your data using any one or more combinations of these offerings along with your on-premise environments. You will also find some architecture details which are very important for an end user to know in order to run operations using Azure.

The book is available on Apress and Amazon.

We would love to hear any feedback about the book. It could be good, bad or ugly. You will find the resources available for download on the site.

Introducing VDC_Complete for Backup and Restore applications using SQLVDI


Cross post from Tiger team blog.

In addition to its built-in functionality for backup and restore, SQL Server is supported by a large number of third-party backup solutions. SQL Server provides application programming interfaces (APIs) that enable independent software vendors to integrate SQL Server backup and restore operations into their products. These APIs are engineered to provide maximum reliability and performance, and support the full range of SQL Server backup and restore functionality, including the full range of hot and snapshot backup capabilities. In the current implementation of the SQL Server Virtual Backup Device Interface (VDI) protocol, the last message sent from SQL Server to the VDI client will be a VDC_Flush command. To prevent data loss, the VDI client must finish the backup before responding to the VDC_Flush command. There are certain situations like during backups of filestream enabled databases where a VDC_Flush command can be sent more than once during a backup operation. For certain backup applications, processing more than one VDC_Flush might be a challenge. If the VDI client responds to a VDC_Flush command without ensuring the backup is hardened when more data is coming after the VDC_Flush, SQL Server may truncate the transaction log. However, if the backup eventually fails on the VDI client, and the transaction log is also truncated, data loss might occur. If you don’t test your log backups at regular intervals, you wouldn’t figure out that you have a broken transaction log chain till the time you need to actually execute disaster recovery.

If you want to simulate a backup for your SQL Server instance, then you use the SQL Server Backup Simulator which is available on our tigertoolbox GitHub repository. The updated SQLVDI header files required to use VDC_Complete is available on the Microsoft SQL Server Samples GitHub repository.

Improvement

A new change was introduced in SQL Server 2012, SQL Server 2014 and SQL Server 2016 to allow backup and restore applications to know when SQL Server has completed sending the data to the client (VDI) so that it can perform the necessary end of backup tasks. KB3188454 has details about the change. This update adds a new VDI command VDC_Complete that indicates SQL Server has completed sending data to the VDI client. Therefore, the VDI client will be able to finish the backup before it sends response to SQL Server. This functionality allows the VDI client to fail the backup in case something goes wrong, and also prevents the transaction log being truncated without hardening the log backup by the client application.

The improvement was designed keeping backward compatibility in mind since backup applications can target multiple releases and versions of SQL Server at the same time. There can be four different scenarios which are outlined in the table below.

SQL Server Instance (VDI Server) Backup Application (VDI Client) Behavior
Supports VDC_Complete Supports VDC_Complete Client has to request VDF_RequestComplete while fetching the configuration to let the server know that it understands VDC_Complete. Once the server sends back a confirmation using the VDI configuration that it supports VDC_Complete, the client needs to execute the appropriate code path to handle VDC_Complete
Supports VDC_Complete Does not support VDC_Complete Since client does not request VDF_RequestComplete while fetching the configuration, server proceeds using previous behavior to maintain backward compatibility
Does not support VDC_Complete Supports VDC_Complete Server will return a NULL response because it does not support VDC_Complete for the requested feature VDF_RequestComplete
Does not support VDC_Complete Does not support VDC_Complete Behaves with legacy behavior of using only VDC_Flush

VDC_Complete is available for both scenarios backup and restore. If you want to use VDC_Complete for a database restore, then that is possible as well. If you choose to do so, then you will need to negotiate (as shown in the sample below) the use of VDC_Complete before the restore while fetching the VDI configuration.

Sample Code

Let us now look at the code changes required on the client side application which will help backup application work

I am going to use references from the sample simple.cpp file available in “SQL Server Virtual Backup Device Interface (VDI) Specification”. The download location is available in the references listed at the end of this post.

A handshake was implemented for the server and client to negotiate if VDC_Complete is supported by either. This can be done by the client requesting for the VDF_RequestComplete configuration. When the server receives this feature request, it will know that the client understands VDC_Complete and will respond accordingly indicating that it supports VDC_Complete.

      // Setup the VDI configuration we want to use.

      memset (&config, 0, sizeof(config));

      config.deviceCount = 1;

 

    // Request for VDC_Complete feature from the server

    config.features = VDF_RequestComplete;

Once the client receives the configuration, it needs to check the features available (see below) by determining if VDF_CompleteEnabled is set. Once the client determines that the server supports VDC_Complete, it can execute the code path which does the appropriate processing (end of backup book keeping, closing the backup etc.) after it receives the VDC_Complete message.

    hr = vds->GetConfiguration (10000, &config);

      

    if (!SUCCEEDED (hr))

    {

             printf_s (“\nError: VDS::Getconfig fails: 0x%X\n”, hr);

        if (hr == VD_E_TIMEOUT)

        {

                    printf_s(“\nError: Failed to retrieve VDI configuration due to timeout value (10,000 ms).\n”);

        }

        goto shutdown;

    }

      

    // Determine if the server supports VDC_Complete based on configuration parameters returned

if (!(config.features & VDF_CompleteEnabled))

       {

             printf_s(“\nServer does not support VDC_Complete.”);  

       }

       else

       {

             printf_s(“\nServer supports VDC_Complete.”);

       }

      

When the backup application receives a VDC_Complete, the backup application will need to harden the backup and complete book keeping tasks before it acknowledges success for the VDC_Complete message (see below). This will ensure that SQL Server does not advance the LSN without the client application hardening the backup which could lead to a potential data loss situation.

case VDC_Complete:

// Ensure that book keeping is completed.

printf_s(“\n\nSQL Server has signaled the end of the operation.”);

// Harden the backup and close the file

       completionCode = ERROR_SUCCESS;

       break;

Reference

How It Works: SQL Server Backup Buffer Exchange (a VDI Focus)

SQL Server Virtual Backup Device Interface (VDI) Specification

SQL Server Backup Simulator

Updated SQLVDI Header files required for VDC_Complete