I had written about update statistics a long time back and had been meaning to write a clarification on the difference in behavior w.r.t. table statistics being updated when rebuilding indexes using different T-SQL commands. I am sure all of you are aware of why statistics for a table need to be updated with a fullscan to ensure that the optimizer doesn’t fall prey to sub-optimal plans. However, there is a little known fact about how statistics get affected when a DBREINDEX is issued on a table versus an ALTER INDEX…REBUILD ALL.

I created a table with 4 integer columns and inserted 100,000 records into the table using the script below:

-- Create table
create table tblUpdStats (a int primary key,b int,c int, d int)
-- Insert 100,000 rows in the table
set nocount on
declare @var int = 1
while @var<=100000
insert into tblUpdStats values(@var,@var+1,@var+2,@var+3)
set @var+=1
-- Create a non-clustered index on column b
create index idx_b on tblUpdStats(b)
-- Create a user-defined statistics on column c
create statistics stat_c on tblUpdStats(c)
-- Execute the following to query so that the database engine creates an auto-created stats on column d
select d from tblUpdStats where d > 50000
Name                            Updated              Rows    Rows Sampled
------------------------------- -------------------- ------- ------------
idx_b                           May 13 2011  8:03PM  100000  100000
PK__tblUpdSt__3BD0198E558AAF1E  May 13 2011  8:03PM  100000  100000
stat_c                          May 13 2011  8:03PM  100000  100000
_WA_Sys_00000004_53A266AC       May 13 2011  8:03PM  100000  100000

I took a snapshot of the DBCC SHOW_STATISTICS output for the statistics associated with the non-clustered index, clustered index, user created statistics and auto-created statistics. All the rows and rows sampled are equal which means that the statistics are updated with a fullscan. I then inserted another 100,000 rows into the same table. I then executed the following command to rebuild the indexes on the table:

alter index all on tblstats rebuild

I took a snapshot of the DBCC SHOW_STATISTICS command again and notice the difference below:

Name                            Updated              Rows     Rows Sampled
idx_b                           May 13 2011  8:19PM  200000   200000
PK__tblUpdSt__3BD0198E558AAF1E  May 13 2011  8:19PM  200000   200000
stat_c                          May 13 2011  8:03PM  100000   100000
_WA_Sys_00000004_53A266AC       May 13 2011  8:03PM  100000   100000

You will see that the user created and auto-created statistics do not reflect the change in the number of rows. However when I execute a DBREINDEX on the table, I see a different picture altogether. (See show_statistics output below)

dbcc dbreindex ('tblUpdStats','')
Name                               Updated              Rows      Rows Sampled
 ---------------------------------- -------------------- --------- --------------
 idx_b                              May 13 2011  8:31PM  200000    200000
 PK__tblUpdSt__3BD0198E558AAF1E     May 13 2011  8:31PM  200000    200000
 stat_c                             May 13 2011  8:31PM  200000    200000
_WA_Sys_00000004_53A266AC           May 13 2011  8:31PM  200000    200000

Now if I look at the rows sampled and rows column values, I see that the updated column shows the same date for all the objects. In this case the rows sampled and rows are the same indicating that a fullscan occurred for the user created/auto created statistics on the table.

To summarize, ALTER INDEX…REBUILD ALL will not update statistics which were manually created or were auto-created by the Database Engine. It will only update statistics associated with the indexes defined on the table columns. On the other hand, a DBCC DBREINDEX on the table will update statistics for the user created statistics and the auto-created statistics using the default sampling rate. The default sampling comes into play for very large tables. If you have a maintenance plan job which does an index rebuild operation on SQL Server 2005 or above, then that uses an ALTER INDEX…REBUILD command.


Tools Tips and Tricks #1: Process Monitor

I recently wrote about importing a Process Monitor trace into SQL Server database table and crunch up the data to extract the events and call stacks. This prompted me to think about capturing data with Process Monitor and some things I learnt along way while using this tool working at CSS.

imageThe first tip is to disable any activity that you don’t want to capture or are not required for the issue that you are troubleshooting. The capture tracks three classes of operations: File System, Registry and Process. In the toolbar show on the left in the screenshot, you can enable/disable the following captures:

a. Registry activity
b. File System activity
c. Network activity
d. Process and Thread activity
e. Profiling events

More information about the above is available in the Process Monitor help file. image

The command line options specified are immensely helpful if you are scripting the capture of a trace using a batch file or if you are generating an automation routine to load the captured data into another data source. I had used /OpenLog and /SaveAs1 option to generate the XML file from the saved .PML file.

imageOne of the most useful options that I suggest using when capturing a Process Monitor trace is to use the backing file option (/BackingFile command line parameter or CTRL+B when using the GUI). This prevents using the page file as the backing store for trace capture and avoid running in unresponsive server issues while you are still capturing your trace and the paging file fills up. I normally point the backing file to a local drive on the machine which has sufficient amount of disk space.image

Process Monitor can use symbol information, if available, to show functions referenced on event stacks. You can point to the symbol path (local symbol cache or Microsoft Symbol Server: http://msdl.microsoft.com/download/symbols) using Options –> Configure Symbols. Additionally, you can specify the path to the source files for the application in the same dialog. This will help you resolve the function calls using the symbol path and if a source path is present, open a text viewer dialog with the source line highlighted which is being referenced. The symbol path is needed when /SaveAs2 option is used for converting the .PML file to .XML format. Note that this option considerably increases the export time due to symbol resolution time involved.

I am starting a series tagged with “Tools Tips and Tricks” which will document the various tweaks that I use for data collection for the various data collection/analysis tools that I use on a day-to-day basic.