This month’s T-SQL Tuesday is being hosted by Jes Borland (Twitter | Blog) and the topic for this month is “Aggregate Functions”. What really caught my eye was the play on the T-SQL functions to accentuate the fact why Jes justifies her twitter handle of grrrl_geek:
Let’s SUM(thoughts), COUNT(ways we’ve done things), and set MAX(awesome) on our posts!
So at this point, I shall SELECT Content FROM PastExperience WHERE Type = ‘Aggregate Functions’ and pen it together in this blog post!
Aggregate functions have been a great way to slice-and-dice data stored in any RDBMS product and present it in the form which the business wants or would want to look at it. I have seen and also written multiple T-SQL queries in the past which use the HAVING clause along with aggregation functions to filter out data from the final dataset. I was very excited about the use of GROUPING option using ROLLUP for the set of canned queries that I had for aggregating analysis for SQL Performance cases while using SQL Nexus/RML Utilities.
The query below gives me an aggregation of the SUM of all the CPU/Duration/Reads/Writes which completed execution during the time the profiler traces were captured.
select a.hashid,b.OrigText, SUM(a.duration) as SumDuration, SUM(a.Reads) as SumReads, Sum(a.Writes) as SumWrites,SUM(a.CPU) as SumCPU, MAX(a.duration) as MAXDuration, MAX(a.Reads) as MAXReads, MAX(a.Writes) as MAXWrites,MAX(a.CPU) as MAXCPU, MIN(a.duration) as MINDuration, MIN(a.Reads) as MINReads, MIN(a.Writes) as SumWrites,MIN(a.CPU) as MINCPU, AVG(a.duration) as AVGDuration, SUM(a.Reads) as AVGReads, Sum(a.Writes) as AVGWrites,SUM(a.CPU) as AVGCPU, GROUPING(a.hashid) AS [GroupNo] from readtrace.tblBatches a inner join readtrace.tblUniqueBatches b on a.HashID = b.HashID where EndTime is not null group by a.hashid,b.OrigText WITH ROLLUP
With the use of the above query, I get the all the queries with their average, minimum, maximum and total statistics of all the queries along with a rollup of the reads, writes, duration and CPU.
COMPUTE is another new introduction into the world of T-SQL which can really help with data analysis especially from my perspective, since I spend a significant part of my day looking into SQL Performance issues. The query below will give me all the long running batches along with the duration/CPU information for each run with a computed sub-total for each such aggregated row.
select a.spid,b.origtext,a.duration, a.cpu from readtrace.tblBatches a inner join readtrace.tblUniqueBatches b on a.HashID = b.HashID where EndTime is not null order by a.duration,a.spid compute sum(duration) , sum(cpu) BY a.duration,a.spid;
An example output is:
Another example where Aggregation helps is identifying the amount of CPU usage by queries which were captured using profiler traces. Why is this important, you ask? This is very important when I need to compare the CPU usage time reported by SQL Server (captured using Perfmon/WMI etc.) and the actual time the user queries spent executing on CPU. The Statement, RPC, Batch and SP completed events capture the CPU consumption for each request. Aggregating them together gives you the ability to figure out the amount of time that was spent by user queries while executing on CPU. The approach used below uses UNIONS and SUM to figure out the total percentage of CPU used by the queries which completed during the profiler trace capture. The only parameter value that you need to provide is the number of CPUs i.e. the number of logical CPUs being used by the SQL instance. The reason I highlighted the phrase in the previous line is because that the below T-SQL batch works on the presumption that the profiler traces which were captured had all the completed events and captured completion events for all queries that were executing during the timeframe of the data collection. This batch would provide incorrect results if you have a long running CPU driven query which continues executing even after the profiler trace capture is stopped. This batch is ideal for scenarios where a profiler trace was started, a workload executed on the server and then the profiler trace was stopped. Under such a scenario, this would help you in identifying the percentage CPU usage for all queries executed by the workload.
DECLARE @MinTime DATETIME DECLARE @MaxTime DATETIME DECLARE @ms_diff BIGINT DECLARE @number_of_cpus INTEGER SET @number_of_cpus = 2 --Change this as necessary SELECT @MinTime = MIN(ISNULL(u.MinTime, '12-31-9999')) FROM ( SELECT MIN(StartTime) AS MinTime FROM ReadTrace.tblBatches UNION ALL SELECT MIN(EndTime) AS MinTime FROM ReadTrace.tblBatches UNION ALL SELECT MIN(StartTime) AS MinTime FROM ReadTrace.tblStatements UNION ALL SELECT MIN(EndTime) AS MinTime FROM ReadTrace.tblStatements ) AS u SELECT @MaxTime = MAX(ISNULL(u.MaxTime, '01-01-1900')) FROM ( SELECT MAX(StartTime) AS MaxTime FROM ReadTrace.tblBatches UNION ALL SELECT MAX(EndTime) AS MaxTime FROM ReadTrace.tblBatches UNION ALL SELECT MAX(StartTime) AS MaxTime FROM ReadTrace.tblStatements UNION ALL SELECT MAX(EndTime) AS MaxTime FROM ReadTrace.tblStatements ) AS u IF (@MinTime = '12-31-9999' OR @MaxTime = '01-01-1900') BEGIN RAISERROR ('ERROR: The trace does not contain appropriate columns to determine the min/max time values for events in the trace.', 16, 1) RETURN END SELECT @ms_diff = DATEDIFF(ms, @MinTime, @MaxTime) * @number_of_cpus FROM ReadTrace.tblBatches AS b SELECT @MinTime AS MinTime, @MaxTime AS MaxTime, @ms_diff AS total_CPU_milliseconds, @number_of_cpus AS NumCpus SELECT (CAST(SUM(cpu) AS FLOAT) / @ms_diff) * 100 AS cpu_percentage, SUM(cpu) AS sum_cpu, COUNT(*) AS executions, ub.normtext INTO #temp_cpu_percentages FROM ReadTrace.tblBatches AS b INNER JOIN ReadTrace.tblUniqueBatches AS ub ON b.HashID = ub.HashId GROUP BY ub.normtext HAVING SUM(cpu) > 0 ORDER BY sum_cpu DESC SELECT SUM(t.cpu_percentage) AS SQL_cpu_usage_percent FROM #temp_cpu_percentages AS t SELECT TOP 10 * FROM #temp_cpu_percentages AS t ORDER BY t.cpu_percentage DESC DROP TABLE #temp_cpu_percentages
What was the objective of this post?
1. Point out two not-so-commonly used T-SQL functionality (GROUPING/COMPUTE/ROLLUP) which can be quite helpful.
2. Show a few ways of aggregating, slicing and dicing data imported into a SQL Nexus database for quick analysis.
Books Online references: