Awesomesauce: Finding out missing sequences


Another of those #sqlhelp inspired posts. This was around ways to track down missing numbers in a sequence table. This task can be done in multiple ways and as I like to say that there are multiple ways to skin a cat! However, this blog post is about using the new T-SQL enhancements in SQL Server 2012 to figure out missing sequential numbers. I will demonstrate this using an example.

I have a table tblsequences which has two integer columns with the primary key being an int datatype. I did some random inserts into the table. The script below can be used to find out missing sequential numbers in an identity column or an integer column which is supposed to store sequential values.


-- Replace starting value with minimum starting value and increment for your sequence
-- Replace the table name with the table name that you are interested in
declare @startvalue int = 1, @increment int = 1
;with cte as
(
select a,(a-lag(a,1) OVER (ORDER BY a)) as MissingSequences
from tblsequences
)
select a, (MissingSequences/@increment)-1 as MissingSequences
from cte
where MissingSequences > @increment
union all
select TOP 1 MIN (a),
CASE (MIN(a)- @startvalue)/@increment
when 0 then null
else (MIN(a)- @startvalue)/@increment
end as MissingSequences
from tblsequences
group by a
order by a

The output is as shown below in the screenshot. You will notice that the first column reports the primary key value and the MissingSequences column reports the number of missing sequential values lesser than the value in the first column. You will need to make the necessary changes to the two scripts shown in this blog post so that the tables you want analyzed are done so.

image

Next, lets talk about Sequences which is a new feature in SQL Server 2012. I created a new sequence using the a start value of 10 and an increment value of 3.

I used the Sequence which was created for generating OrderID values in two different tables. Now we are posed with an interesting problem. I have two different tables in which I want to compare missing sequence numbers. This can be done with T-SQL code below.


-- Replace schema name, table name(s) and sequence name as appropriate
declare @startvalue int = 1,@interval int = 1, @seqname sysname = 'TestSeq', @schemaname sysname = 'dbo'
select @startvalue = TRY_CAST(TRY_CAST(start_value as varchar(255)) as int),
@interval = TRY_CAST(TRY_CAST(increment as varchar(255)) as int)
from sys.sequences
where name = @seqname and [schema_id] = (select [schema_id] from sys.schemas where name = @schemaname)

if (@startvalue IS NOT NULL and @interval IS NOT NULL)
begin
;with cte as
(
select OrderID,(OrderID-lag(OrderID,1) over (order by OrderID)) as MissingSequences
from
(select OrderId as OrderID
from tblTestSeq
union all
select OrderId as OrderID
from tblTestSeq_2) A
)
select OrderID, (MissingSequences/@interval)-1 as MissingSequences
from cte
where MissingSequences > @interval
union all
select TOP 1 OrderID,
CASE (MIN(OrderID)- @startvalue)/@interval
when 0 then null
else (MIN(OrderID)- @startvalue)/@interval
end as MissingSequences
from tblTestSeq
group by OrderID
order by OrderID
end
else

else
begin
PRINT 'CAST FAILED'
end


The logic used is the same as the logic when we were looking for missing sequences in identity/integer columns described in the first half of the post. However, since sequences can be used for one or more tables, I have used a UNION query to get all the columns together which use the same sequence object. The second result set of the output in the screenshot below shows the tables involved along with the sequential OrderIDs. As you can see below, there are 4sequence numbers missing  10, 13, 16 and 28. The first result set reports exactly that by notifying that there are 3 sequence ids missing lesser than 19 and 1 sequence number missing lesser than 31.

The output of the above T-SQL script is shown in the screenshot below.

image

Another way how T-SQL enhancements in SQL Server 2012 can make your life easier.

Happy T-SQL coding and a have a great weekend!

Advertisements

Hello Analytic Functions


SQL Server 2012 CTP 3, formerly known as SQL Server Code Name “Denali”, introduces a new set of T-SQL functions called Analytic functions. Analytic functions now open up a new vista for business intelligence where in you can calculate moving averages, running totals, percentages or top-N results within a group. I find this very useful while analyzing performance issues while traversing information present in a SQL Server trace file.

I was looking into a performance issue where in an application module executing a series of T-SQL functions was taking a long time to complete it’s operation. When I looked into the total duration of the T-SQL queries executed by the application, I couldn’t account for the total duration that the application was reporting. On tracking some of the statement executions done by the SPID which was being used by the application to execute the queries, I found a difference between the start time of a batch and the completed time of the previous batch. Now I needed to see the complete time difference between two subsequent query completion and start accounted for the difference in duration that I was seeing between the duration reported by the application and sum of duration of all the queries executed by the application. And BINGO… I was finally able to make the co-relation. Till SQL Server 2008 R2, I would have to write a query which involved a self-join to get the comparative analysis that I required:

;WITH cte AS
(SELECT b.name, a.starttime, a.endtime, a.transactionid, a.EventSequence, ROW_NUMBER() OVER(ORDER BY eventsequence) AS RowIDs
FROM trace a
INNER JOIN sys.trace_events b
ON a.eventclass = b.trace_event_id
WHERE spid = 83
AND b.name IN ('RPC:Starting','RPC:Completed','SQL:BatchStarting','SQL:BatchCompleted'))
SELECT TOP 1000 b.name, b.starttime, b.endtime, b.transactionid, DATEDIFF(S,a.endtime,b.starttime) as time_diff_seconds
FROM cte a
LEFT OUTER cte b
ON a.RowIDs = b.RowIDs-1

The output of the above query is shown in the screen shot below:

image

As you can see that there is a 4-second delay between the endtime of the statement in Row# 783 and the next execution shown in Row# 784. With the help of Analytic functions, I can simply use the LEAD function to get the above result and avoid a self-join.

SELECT  TOP 1000 a.name,b.StartTime,b.EndTime,b.TransactionID,
DATEDIFF(s,(LEAD(b.EndTime,1,0) OVER (ORDER BY EventSequence DESC)),b.StartTime) as TimeDiff
FROM sys.trace_events a
INNER JOIN dbo.trace b
on a.trace_event_id = b.EventClass
WHERE b.SPID = 83
and a.name in ('RPC:Starting','RPC:Completed','SQL:BatchStarting','SQL:BatchCompleted')

The output as you can see is the same the previous query:

image

I had imported the data from the profiler trace into a SQL Server database table using the function: fn_trace_gettable. Let’s see what the query plans look like. For the first query which uses the common table expression and a self-join, the graphical query plan is as follows:

image

Now let’s see what the query plan looks like with the new LEAD function in action:

image

As you can see above a new Window Spool operator is the one which performs the analytical operation to calculate the time difference between the subsequent rows using the EventSequence number. As you can see that I have eliminated the need for a self-join with a temporary table or a common table expression and therefore simplifying my query in the process.

In the above example I am using the LEAD function to get value that I am interested in the following row. If you are interested in the values from a preceding row then you can use LAG function.

One gotcha that you need to remember here is that if you don’t take care of the start and end values of the dataset which you are grouping, you could run into the following error due to an overflow or underflow condition.

Msg 535, Level 16, State 0, Line 1
The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.

This is a small example of how analytic functions can help reduce T-SQL complexity when calculating averages, percentiles for grouped data. Happy coding!!

Disclaimer: This information is based on the SQL Server 2012 CTP 3 (Build 11.0.1440), formerly known as SQL Server Code Name “Denali” documentation provided on MSDN which is subject to change in later releases.