Hello Analytic Functions


SQL Server 2012 CTP 3, formerly known as SQL Server Code Name “Denali”, introduces a new set of T-SQL functions called Analytic functions. Analytic functions now open up a new vista for business intelligence where in you can calculate moving averages, running totals, percentages or top-N results within a group. I find this very useful while analyzing performance issues while traversing information present in a SQL Server trace file.

I was looking into a performance issue where in an application module executing a series of T-SQL functions was taking a long time to complete it’s operation. When I looked into the total duration of the T-SQL queries executed by the application, I couldn’t account for the total duration that the application was reporting. On tracking some of the statement executions done by the SPID which was being used by the application to execute the queries, I found a difference between the start time of a batch and the completed time of the previous batch. Now I needed to see the complete time difference between two subsequent query completion and start accounted for the difference in duration that I was seeing between the duration reported by the application and sum of duration of all the queries executed by the application. And BINGO… I was finally able to make the co-relation. Till SQL Server 2008 R2, I would have to write a query which involved a self-join to get the comparative analysis that I required:

;WITH cte AS
(SELECT b.name, a.starttime, a.endtime, a.transactionid, a.EventSequence, ROW_NUMBER() OVER(ORDER BY eventsequence) AS RowIDs
FROM trace a
INNER JOIN sys.trace_events b
ON a.eventclass = b.trace_event_id
WHERE spid = 83
AND b.name IN ('RPC:Starting','RPC:Completed','SQL:BatchStarting','SQL:BatchCompleted'))
SELECT TOP 1000 b.name, b.starttime, b.endtime, b.transactionid, DATEDIFF(S,a.endtime,b.starttime) as time_diff_seconds
FROM cte a
LEFT OUTER cte b
ON a.RowIDs = b.RowIDs-1

The output of the above query is shown in the screen shot below:

image

As you can see that there is a 4-second delay between the endtime of the statement in Row# 783 and the next execution shown in Row# 784. With the help of Analytic functions, I can simply use the LEAD function to get the above result and avoid a self-join.

SELECT  TOP 1000 a.name,b.StartTime,b.EndTime,b.TransactionID,
DATEDIFF(s,(LEAD(b.EndTime,1,0) OVER (ORDER BY EventSequence DESC)),b.StartTime) as TimeDiff
FROM sys.trace_events a
INNER JOIN dbo.trace b
on a.trace_event_id = b.EventClass
WHERE b.SPID = 83
and a.name in ('RPC:Starting','RPC:Completed','SQL:BatchStarting','SQL:BatchCompleted')

The output as you can see is the same the previous query:

image

I had imported the data from the profiler trace into a SQL Server database table using the function: fn_trace_gettable. Let’s see what the query plans look like. For the first query which uses the common table expression and a self-join, the graphical query plan is as follows:

image

Now let’s see what the query plan looks like with the new LEAD function in action:

image

As you can see above a new Window Spool operator is the one which performs the analytical operation to calculate the time difference between the subsequent rows using the EventSequence number. As you can see that I have eliminated the need for a self-join with a temporary table or a common table expression and therefore simplifying my query in the process.

In the above example I am using the LEAD function to get value that I am interested in the following row. If you are interested in the values from a preceding row then you can use LAG function.

One gotcha that you need to remember here is that if you don’t take care of the start and end values of the dataset which you are grouping, you could run into the following error due to an overflow or underflow condition.

Msg 535, Level 16, State 0, Line 1
The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.

This is a small example of how analytic functions can help reduce T-SQL complexity when calculating averages, percentiles for grouped data. Happy coding!!

Disclaimer: This information is based on the SQL Server 2012 CTP 3 (Build 11.0.1440), formerly known as SQL Server Code Name “Denali” documentation provided on MSDN which is subject to change in later releases.

Advertisements

T-SQL Tuesday #20 Wrap-up and a few smiles


I had the privilege of hosting this month’s T-SQL Tuesday which is a recurring blog party, the brainchild of MVP Adam Machanic (blog|twitter). It goes without saying that the posts this time were a wonderful read and it’s very encouraging to see such brilliant engagement from the community. A big “thank you” to all the people who took the time to pen down a post for this month’s topic: “T-SQL Best Practices”. If you are a newcomer to the world of T-SQL or a seasoned pro, the insights shared in these posts are definitely a good read if you work with SQL Server.

Below is a round-up of all the posts from this month (in chronological order of comments appearing on my invitation post):

1. Jason Bacani [Blog | Twitter] writes about why T-SQL formatting is best as well as good practice and makes it easier for the person supporting the code that you have written. I feel motivated on working on the code only if I like looking at it.

2. Rob Farley [Blog | Twitter], a popular SQL Server MVP, talks about why understanding query plans can help in writing better and more efficient T-SQL code. Totally agree with Rob here and from experience I can say that this goes a long way in writing efficient T-SQL code.

3. Dirk Wegener [Blog] talks about why including comments to explain your code logic and reiterates why T-SQL formatting is important in his first T-SQL Tuesday post. Way to go Dirk and we look forward to your participation in the future.

4. Noel McKinney [Blog | Twitter] talks about coding best practices which are quite pertinent in today’s development world. I completely agree that everything shouldn’t be done through T-SQL. Sometimes exploring other avenues may be the right answer!

5. Pinal Dave [Blog | Twitter], a friend and now a colleague at Microsoft, talks about practices that can be avoided in environments generally and not just from a T-SQL perspective.

6. Thomas Rushton [Blog | Twitter] talks about coding best practices, some of which is generally applicable to the world outside T-SQL as well. Absolutely loved the quote at the end of your post!

7. Rich Brown [Blog | Twitter] shows with an example how a developer managed to get into trouble by using an ORDER BY construct for an INSERT statement.

8. Sebastian Meine [Blog | Twitter] shows why an interface layer is required between the application and database. This is quite difficult to achieve due to various reasons (some of it maybe just ignorance sometimes) but if you can pull it off, it really does save a lot of time during a post-deployment crisis scenario.

9. Muthukkumaran Kaliyamoorthy [Blog | Twitter] talks about a list of DOs and DON’Ts while writing T-SQL code.

10. Nick Haslam [Blog | Twitter] talks about how multiple database files can help improve throughput of your DML queries for large data warehouse with the help of some TPC-H test numbers that he shares in the same post.

11. Marcos Freccia [Blog | Twitter] talks about a recent performance problem he resolved by avoiding an implicit convert operation in the query plan with the help of a small change in his T-SQL query. An example of how following best practices can help performance.

12. Matt Velic [Blog | Twitter] talks about best practices to follow while writing T-SQL code from a readability, formatting and code clarity standpoint. Supporting and fixing a code is much faster if it’s easy to read!

13. Robert Matthew Cook [Blog | Twitter] shows how not following simple best practices led to a cascading effect where a stored procedure failed to execute. He goes the extra mile to explain why the issue happened!

14. Erin Stellato [Blog | Twitter] explains why following T-SQL coding standards and adding comments to your code can help save painful hours of searching and brain-wracking during the time of a crisis.

15. Josh Feierman [Blog | Twitter] talks about what can be done in the current scheme of things with the business angle kept in mind. And what he talks about in his post are ground realities and what can be done to counter them to make us efficient in what we do without having sleepless nights while handling a critical issue.

16. Aaron Bertrand [Blog | Twitter], a popular MVP, talks about some of the best practices that should be followed while writing T-SQL code. He provides examples on what kind of issues you could run into when such practices are not followed.

17. Jack Vamvas [Blog | Twitter] gives an example and shows how filtered indexes come to the rescue when T-SQL code which doesn’t follow best practices cannot be tuned due to support and license agreements.

18. Gabriel Villa [Blog | Twitter] talks about T-SQL coding best practices which would help in better management of the code base in the post-deployment phase. This would definitely lead to less consumption of aspirins for the team managing and supporting the code.

19. Nancy Hidy Wilson [Blog | Twitter] talks about adding proper comments to your T-SQL code in a crisp and concise post.

20. Ricardo [Blog | Twitter] re-iterates why comments in the code are necessary but they necessarily don’t need to be short stories. Additionally, he mentions about testing the T-SQL code to ensure that it works in production and not just on the Development box that you are using it write the code on. The post is in Spanish but Bing Translate helped me read the English version of the post and I did have a smile on my face while reading through it.

21. Colleen Morrow [Blog | Twitter] explains with an example why using functions in the predicate of a T-SQL query can be bad karma for your code.

22. Carlos Bossy [Blog | Twitter] explains in his post why T-SQL code made to function in a procedural manner is not probably the best approach.

23. Jes Borland [Blog | Twitter] shares her top three T-SQL best practices and how keeping it simple is the best idea!

Once again, my sincere thanks to all the SQL folks that contributed to this month’s T-SQL Tuesday! And also, my thanks to Adam for letting me host this virtual blog party for this month!

Since a major part of my daily job involves troubleshooting SQL Server performance related issues, I do come across T-SQL code which from a best practices standpoint has a lot to be desired. So, I would say that the next few lines are my 2 cents from what I have seen based on my experience but I ‘m afraid it’s going to turn out to be a $1 worth of rambling!

1. If your company has established coding standards/formats, follow them – This helps keep consistent across the board and the standards and formats were put in place majority of the time with a good intention in mind. If there aren’t any defined, then define a set for yourself and follow them!
2. Use some form of source-control – It could be as simple as adding comments in the procedure code itself. It really saves time and helps blood pressure levels stay normal when trying to find out why a small change in the code was made and when, which is the current cause of the performance bottleneck!
3. Add comments/description to your code – I am yet to come across a production application which has a single person in-charge of the production, support, maintenance and version upgrades. Your code more often than not will be supported and maintained by someone else. Help that person understand why that piece of code is there and what is it supposed to do. What is obvious to you may not be obvious to someone else. So, rather than having a DBA curse you at an unearthly hour, mention (a single line is sufficient most times) why a best practice was not followed.
4. T-SQL is not the answer for everything – There are some things that can probably be done much better through code in some other language like C#. Don’t be afraid to explore such an option. At the end of the day, SQL Server is a database platform and NOT an application server. (*ducks and runs*)
5. It doesn’t hurt to look under the hood – It’s not necessary to know how the query optimizer works or how to read a query plan or even how indexes should be designed for tables for writing T-SQL code. But if you did understand some of this if not a majority of it, you will find that you end up writing much more efficient T-SQL code than you used to when you didn’t know some of under-the-hood workings of the database engine.

Do not misunderstand the above points to mean that I write the best T-SQL code can be written. All I am trying to say is that write T-SQL code which is easy to read, decipher and support! Even then as a Developer you might not become a DBA’s best friend overnight but you would have taken a giant step in getting there and bridging that chasm that exists between the two worlds! Smile

While reading Thomas Rushton’s [Blog | Twitter] submission this month I came across the quote mentioned below. Going forward, this is going to be my punch-line when I talk about best practices. If you are one of those individuals where a visual aid helps, then the above picture should probably aid in following best practices while writing T-SQL code.

Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live.

*courtesy Jeff Atwood (blog|twitter)

In case you want to keep getting updates on what is happening on my blog, then you can do so using one of the icons below (Facebook page, Twitter or RSS Feeds).

Facebook Fan PageTwitterRSS Feeds

Disclaimer: I have provided the correct links to your blog, twitter accounts and posts to the best of my knowledge. In case there are discrepancies, please leave a comment and I shall make the necessary corrections. Such an error was purely unintentional!

Invitation for T-SQL Tuesday #20 – T-SQL Best Practices


image

What is T-SQL Tuesday?

T-SQL Tuesday is the brain-child of MVP Adam Machanic (blog|twitter). It happens once a month on the 2nd Tuesday and is hosted by a different person in the SQL community each time. The idea is to get the SQL Community blogging on a common topic on a single day of the month after which a round-up of all the participating posts are done by the hosting blog owner to give the SQL community a one-stop shop of some great articles on a single topic.

What is the topic for July?

The topic for July is “T-SQL Best Practices”. If you work with SQL Server, then undoubtedly you would have had to write T-SQL queries atleast once or would have had to debug the seemingly useful piece of T-SQL code written by your developers to find out where the performance bottleneck or problem was. Your post for this month’s revolving blog party could be along one of the areas:

a. A set of T-SQL best practices that you follow in your shop that or you believe that should be followed always. It could be as specific as for writing only linked server queries or writing queries for SSIS packages etc.
b. An issue that you resolved because certain T-SQL best practices were not followed.
c. A workaround that you used (like query hints) to resolve an issue where T-SQL best practices couldn’t be implemented due to involvement of a third party solution.

Why did I choose this topic?

Over the years of troubleshooting SQL performance related issues, I have found on multiple occasions that the T-SQL query in question was performing badly because certain best practices for writing that piece of code were not followed and the one responsible for the development had not foreseen that such an oversight could become a bottleneck when the data or the number of users increased. So, I thought it would be a good idea to get the SQL Community’s thoughts around best practices in this area. Sometimes, the most obvious things are the easiest to overlook!

What are the Rules?

T-SQL Tuesday has some basic rules which are pretty easy to follow.

1. Your post must be published between 00:00 GMT Tuesday July 12, 2011, and 00:00 GMT Wednesday July 13, 2011.
2. Your post must contain the T-SQL Tuesday logo from above and the image should link back to this blog post.
3. Trackbacks should work, but if you don’t see one please link to your post in the comments section below so everyone can view your blog post.

Good-to-have

  • Include a reference to T-SQL Tuesday in the title of your post
  • Tweet about your post using the hash tag #TSQL2sDay
  • Being creative always fetches brownie points and gets a smile from the reader!

So now it’s time for:

SELECT [content] FROM [dbo].[tblbrain] WHERE [content_type] = ‘T-SQL’

Before I wrap-up this post, a big thanks to Adam for letting me host T-SQL Tuesday for this month!