How to update statistics for a VLDB


Statistics are something that the SQL Optimizer depends on to generate a plan. The more outdated your statistics are, greater are the chances that your query optimizer will land up with a sub-optimal plan.
So what do you do when you have a database which is highly transactional in nature and is quite large in size. So, you might land up in a scenario where the database is online 24X7 and there is no downtime window where you can update your statistics with a FULLSCAN. 

Before we get into what my opinion is about updating statistics of a database is, lets make sure we understand what I mean by statistics being “up-to-date“. Statistics being up-to-date is the sampling rate used while updating statistics was 100% or they were updated with a FULLSCAN. Statistics updated using AUTO UPDATE STATISTICS property of a database will not update statistics with a FULLSCAN contrary to popular belief. The sole reason for this being is that suppose a table has 1 GB of data and a query on the database initiated a AUTO UPDATE of a STAT on this database, then if the sampling was 100%, then your query duration would take an eternity. 

So, if you have a large database:
1. Ensure that the AUTO UPDATE STATS is turned on for the database. This would provide for some relief
2. Identify which tables need to have their statistics updated with a fullscan. Only those would be required which have the maximum number of queries running on them
3. Next identify a period when the database usage is low. This is the time when you can use to update the stats of the other tables which did not qualify the list in #2.
4. Use a script to update statistics of the tables identified step # 2 probably based on rowmodctr values (ROWMODCTR: Counts the total number of inserted, deleted, or updated rows since the last time statistics were updated for the table. In SQL Server 2005, this is not going to be always helpful [Ref: http://msdn2.microsoft.com/en-us/library/ms190283.aspx]. But in SQL 2000, this could be used as a deciding factor) 

Sample Script
============
Let’s say I had identified 4 tables in STEP 2:
CREATE TABLE UPD_STATS_TBL (tblname varchar(10), dt_updated datetime, rowmodctr bigint) 

SELECT * FROM UPD_STATS_TBL 

tblname dt_updated rowmodctr 

TBL1 2007-10-24 21:13:46.123 20000
TBL2 2007-10-23 21:13:46.123 400000
TBL3 2007-10-22 21:13:46.123 508000
TBL4 2007-10-24 20:13:46.123 87000
 

**************************************************  

CREATE PROC UPD_STATS_DB 

AS 

DECLARE @tbl varchar(10) 

UPDATE upd_stats_tbl SET rowmodctr = (SELECT MAX(rowmodctr) from sys.sysindexes where id=OBJECT_ID(tblname)) 

SELECT TOP 1 @tbl = tblname 

FROM UPD_STATS_TBL 

WHERE rowmodctr > 10000 

ORDER BY dt_updated 

-- For SQL Server 2005, you would have to used the STATS DATE function instead of rowmodctr values. Refer my statistics update post for more details. 

DECLARE @stmt varchar (100) 

SET @stmt = 'UPDATE STATISTICS'+SPACE(1)+@tbl+SPACE(1)+'WITH FULLSCAN' 

EXEC (@stmt) 

UPDATE upd_stats_tbl SET dt_updated = (SELECT GETDATE()) 

WHERE tblname = @tbl

**************************************************
NOTE: You could refine this down to the table statistic if you wanted to.

Advertisements

It is always good to hear from you! :)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s