The need for a sound Backup Strategy

Posted on December 30, 2009 by Amit Banerjee

Backup strategy needs to be designed in such a manner that it causes minimum amount of downtime in case during a disaster recovery scenario if backups need to be restored. The backup strategy depends on the following:
1. Size of the database – This would determine how often a FULL BACKUP can be taken
2. Recovery Model – This would determine if transaction log/differential backups are possible
3. The importance of the database – This would determined how often transaction log backups need to be taken if the database is in full recovery model.

If the database is in “simple” recovery model, then the only option that we would have is to take full backups. If the database is in FULL/BULK-LOGGED recovery models, then we have the option of taking transaction log/differential backups.

Now the next question is how often to take a transaction log backup. Depending of the importance of the database, transaction log backups can be taken every 10-15 minutes or even every hour. This needs to be determined at your end. Furthermore, if the transaction log backups are being taken very frequently and a full backup is taken once a week, then it is advisable to take differential backups during the middle of the week. This would ensure that the process of restore becomes less cumbersome as after the full backup, then differential backup can be applied followed by the transactional log backups.

In case of full/bulk-logged recovery models, the following backup strategy would be ideal:

1. Full backup

a. Differential backup(s)

i. Transaction log backup(s)

b. Differential backup(s)

i. Transaction log backup(s)

2. Full backup (and repeat the above cycle)

However, any backup strategy that you design needs to be compliant with your business needs.

Furthermore, for creating sound fail-over strategies for disaster recovery scenarios, the following can be considered:
1. Log shipping

2. Database Mirroring (For SQL 2005 SP1 and above)

3. Replication (Snapshot, Transactional, Merge)

4. Failover Clustering

5. Or an in-house disaster recovery mechanism envisioned by your IT & DBA team which could be a combination of one or more of the above methods.
Also, it is considered a good practice to restore a backup on a test server and a CHECKDB run on the restored database to make sure that the backup is in good shape. This becomes of utmost importance when it is not possible to run a CHECKDB before taking a backup. In SQL Server 2005, you have the option to perform a piece meal restore which would allow you to restore up to the page level in case you have database corruption.

The following articles could be helpful:

Overview of Restore and Recovery in SQL Server
http://msdn2.microsoft.com/en-us/library/ms191253.aspx

SQL Server 2005 provides the option of performing an online restore which is available for SQL Server 2005 Enterprise Edition. Furthermore, you have option of performing Piece Meal Restores i.e. page level restores, single file restore for a secondary data file in a filegroup while keeping the database online etc.

Please refer the following for more information:
Performing Online Restores
http://msdn2.microsoft.com/en-us/library/ms188671.aspx
Storage Top 10 Best Practices
http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/storage-top-10.mspx
Using Recovery Models
http://msdn2.microsoft.com/en-us/library/aa173678(SQL.80).aspx

How to update statistics for a VLDB

Posted on December 30, 2009 by Amit Banerjee

Statistics are something that the SQL Optimizer depends on to generate a plan. The more outdated your statistics are, greater are the chances that your query optimizer will land up with a sub-optimal plan.
So what do you do when you have a database which is highly transactional in nature and is quite large in size. So, you might land up in a scenario where the database is online 24X7 and there is no downtime window where you can update your statistics with a FULLSCAN.

Before we get into what my opinion is about updating statistics of a database is, lets make sure we understand what I mean by statistics being “up-to-date“. Statistics being up-to-date is the sampling rate used while updating statistics was 100% or they were updated with a FULLSCAN. Statistics updated using AUTO UPDATE STATISTICS property of a database will not update statistics with a FULLSCAN contrary to popular belief. The sole reason for this being is that suppose a table has 1 GB of data and a query on the database initiated a AUTO UPDATE of a STAT on this database, then if the sampling was 100%, then your query duration would take an eternity.

So, if you have a large database:
1. Ensure that the AUTO UPDATE STATS is turned on for the database. This would provide for some relief
2. Identify which tables need to have their statistics updated with a fullscan. Only those would be required which have the maximum number of queries running on them
3. Next identify a period when the database usage is low. This is the time when you can use to update the stats of the other tables which did not qualify the list in #2.
4. Use a script to update statistics of the tables identified step # 2 probably based on rowmodctr values (ROWMODCTR: Counts the total number of inserted, deleted, or updated rows since the last time statistics were updated for the table. In SQL Server 2005, this is not going to be always helpful [Ref: http://msdn2.microsoft.com/en-us/library/ms190283.aspx]. But in SQL 2000, this could be used as a deciding factor)

Sample Script
============
Let’s say I had identified 4 tables in STEP 2:
CREATE TABLE UPD_STATS_TBL (tblname varchar(10), dt_updated datetime, rowmodctr bigint)

SELECT * FROM UPD_STATS_TBL

tblname dt_updated rowmodctr

TBL1 2007-10-24 21:13:46.123 20000
TBL2 2007-10-23 21:13:46.123 400000
TBL3 2007-10-22 21:13:46.123 508000
TBL4 2007-10-24 20:13:46.123 87000

**************************************************

CREATE PROC UPD_STATS_DB 

AS 

DECLARE @tbl varchar(10) 

UPDATE upd_stats_tbl SET rowmodctr = (SELECT MAX(rowmodctr) from sys.sysindexes where id=OBJECT_ID(tblname)) 

SELECT TOP 1 @tbl = tblname 

FROM UPD_STATS_TBL 

WHERE rowmodctr > 10000 

ORDER BY dt_updated 

-- For SQL Server 2005, you would have to used the STATS DATE function instead of rowmodctr values. Refer my statistics update post for more details. 

DECLARE @stmt varchar (100) 

SET @stmt = 'UPDATE STATISTICS'+SPACE(1)+@tbl+SPACE(1)+'WITH FULLSCAN' 

EXEC (@stmt) 

UPDATE upd_stats_tbl SET dt_updated = (SELECT GETDATE()) 

WHERE tblname = @tbl

**************************************************
NOTE: You could refine this down to the table statistic if you wanted to.

Statistics Update

Posted on December 30, 2009 by Amit Banerjee

This is an important task in terms of database maintenance. This ensures the statistics are up-to-date which in turn would ensure that the query optimizer doesn’t land up with sub-optimal plans. The two ways to monitor for the need for an update on statistics are:
1. STATS_DATE function

2. rowmodctr value in the sysindexes (For SQL 2000) and sys.sysindexes (For SQL 2005) output

The STATS_DATE function would give you the date when the statistics were updated last. The rowmodctr value would tell you how many changes have taken place in terms of update, inserts and deletes in the data of the column the index is associated with. However, this counter is not applicable for SQL Server 2005 as SQL Server 2005 and higher versions use Column Modifications to track modifications on tables and based on the same, the statistics are updated if AUTO UPDATE STATS property of the database is set.

The frequency of statistics update depends on the following:
1. Do BULK operations happen on the database tables in question?
2. Is there an off-peak period where a statistics update with full scan can be done?
3. How often is the data modified in the database tables in question?

Statistics update can be a resource intensive operation depending on the size of the table. If the data in the tables in question change very rarely, then a statistics update with a full scan can be done during a maintenance window. Statistics update is always an ONLINE operation and doesn’t cause the database to be in an OFFLINE mode.

If there are BULK operations happening in a table, then statistics have a tendency to getting skewed after the BULK operation. The best option is do perform a statistics update on this table if this tables is to be used by an application after the bulk operation.

If the database is updated very frequently and the database is very large in size, then it needs to be determined which tables are most frequently updated. Based on this, a statistics updated can be performed for only those tables and a statistics update can be done for the entire database with a lower sampling rate like 20-30% depending on what is suitable. This can be determined by comparing historical data and finding out what kind of sampling rate is suitable for your needs.

Another option is to enable AUTO UPDATE STATISTICS option for the database. But it needs to be monitored whether this is a boon or a bane. AUTO UPDATE STATISTICS can be good when the statistics are relatively updated and it makes sure that the statistics don’t fall too out of place. This feature has a downside when the statistics become out-dated too frequently, then you would have AUTO UPDATE STATISTICS being fired every time and this would cause all cached plans associated with the rows whose statistics have been updated to be recompiled. This can cause a serious bottleneck if there are too many auto update statistics events fired in short time span.

The following methods can be used to update the statistics:
1. UPDATE STATISTICS WITH FULLSCAN (or a lower sampling rate) Please refer the following article for further details: http://msdn2.microsoft.com/en-us/library/aa260645(SQL.80).aspx

2. sp_updatestats

3. A maintenance plan or SQL Agent job to update the statistics
The following script for SQL Server 2005 would be helpful in determining how badly affected the statistics are for the index associated with it. The rowmodctr counts the total number of inserted, deleted, or updated rows since the last time statistics were updated for the table. Again this is applicable for SQL Server 2000 only.

A DBCC DBREINDEX would update the statistics associated with those indexes. ~~But this would not update the AUTO CREATED STATISTICS.~~ The DBCC SHOW_STATISTICS command could also help you to determine the statistics condition for the particular index. The “Row Sampled” and “Rows” if equal would indicate that the sampling is currently 100%.

Modification: May 13, 2011:

I had earlier stated that a DBCC DBREINDEX wouldn’t update the auto-created statistics but this is incorrect. The current behavior is that the auto-created statistics and manually created column statistics are not updated only when ALTER INDEX..REBUILD ALL is used. However, when you use DBCC DBREINDEX, it would update the statistics for the manually and auto-created statistics with the default sampling rate.

Statistics for INDEX ‘pk_customers’.

Updated Rows Rows Sampled Steps Density Average key length

Jun 23 2007 5:03PM 91 91 91 1.0989011E-2 10.0

The following articles could be helpful:
UPDATE STATISTICS
http://msdn2.microsoft.com/en-us/library/aa260645(SQL.80).aspx
Statistics used by Query Optimizer in SQL Server 2005
http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx

Reorganize data and index pages

Posted on December 30, 2009 by Amit Banerjee

There are a few ways that this can be achieved:

1. Shrinking the database files (data file or transaction log file)

2. DBCC DBREINDEX (Similar to dropping and re-creating the indexes)

3. DBBC INDEXDEFRAG (De-fragmenting the indexed pages to remove logical scan fragmentation)

Note: It needs to be kept in mind that DBREINDEX is a fully-logged operation and can cause the transaction log to grow by large amounts when the database size is considerably large. So there have been situations when the transaction log has grown out of proportion and filled up the entire disk. If the T-log does grow out of proportion and you do choose to truncate/shrink the file, then please do so after verifying the points below:

1. Transactional replication is not configured for the database

2. Log shipping is not configured for the database

3. Database mirroring is not configured for the database (For SQL 2005 SP1 and above)

4. Any other operations are not being performed which need transactional log consistency in terms of LSN chains

Online re-indexing is a feature which is available in SQL Server 2005 Enterprise Edition only. Please note that rebuilding the indexes would cause the statistics associated with those indexes to also be updated. This however would not update the AUTO CREATED statistics.

The following articles could be useful:

SET OPTION considerations when running DBCC with indexes on computed columns

http://support.microsoft.com/kb/301292/

DBCC DBREINDEX

http://msdn2.microsoft.com/en-us/library/ms181671.aspx

The DBCC SHOWCONTIG output for SQL Server 2000 and 2005 will help you find out the current status of fragmentation for the database. A sample output would look like this:

DBCC SHOWCONTIG scanning ‘Customers’ table…

Table: ‘Customers’ (2073058421); index ID: 1, database ID: 6

TABLE level scan performed.

– Pages Scanned…………………………..: 3

– Extents Scanned…………………………: 2

– Extent Switches…………………………: 1

– Avg. Pages per Extent……………………: 1.5

– Scan Density [Best Count:Actual Count]…….: 50.00% [1:2]

– Logical Scan Fragmentation ………………: 0.00%

– Extent Scan Fragmentation ……………….: 50.00%

– Avg. Bytes Free per Page…………………: 246.7

– Avg. Page Density (full)…………………: 96.95%

The logical scan fragmentation should ideally be below 20% but anything above 40% would mean an INDEXDEFRAG or DBREINDEX is required. The script which is from the same MSDN article can help you run an INDEXDEFRAG or DBREINDEX based on the logical scan fragmentation count.

A sample script for the same can be found under the topic of "DBCC SHOWCONTIG" in SQL Server Books Online (2000 and 2005)

In SQL Server 2005, the following features may be helpful to you while using the ALTER INDEX command:

REBUILD [ WITH ( [ ,… n]) ]

Specifies the index will be rebuilt using the same columns, index type, uniqueness attribute, and sort order. This clause is equivalent to DBCC DBREINDEX. REBUILD enables a disabled index. Rebuilding a clustered index does not rebuild associated nonclustered indexes unless the keyword ALL is specified. If index options are not specified, the existing index option values stored in sys.indexes are applied. For any index option whose value is not stored in sys.indexes, the default indicated in the argument definition of the option applies.

The options ONLINE and IGNORE_DUP_KEY are not valid when you rebuild an XML index.

If ALL is specified and the underlying table is a heap, the rebuild operation has no affect on the table. Any nonclustered indexes associated with the table are rebuilt.

The rebuild operation can be minimally logged if the database recovery model is set to either bulk-logged or simple.

WITH ( LOB_COMPACTION = { ON OFF } )

Specifies that all pages that contain large object (LOB) data are compacted. The LOB data types are image, text, ntext, varchar(max), nvarchar(max), varbinary(max), and xml. Compacting this data can improve disk space use. The default is ON.

All pages that contain large object data are compacted.Reorganizing a specified clustered index compacts all LOB columns that are contained in the clustered index. Reorganizing a nonclustered index compacts all LOB columns that are nonkey (included) columns in the index. For more information, see Creating Indexes with Included Columns. When ALL is specified, all indexes that are associated with the specified table or view are reorganized, and all LOB columns that are associated with the clustered index, underlying table, or nonclustered index with included columns are compacted. OFF Pages that contain large object data are not compacted.

OFF

has no affect on a heap.The LOB_COMPACTION clause is ignored if LOB columns are not present.

REORGANIZE

Specifies the index leaf level will be reorganized. This clause is equivalent to DBCC INDEXDEFRAG. ALTER INDEX REORGANIZE statement is always performed online. This means long-term blocking table locks are not held and queries or updates to the underlying table can continue during the ALTER INDEX REORGANIZE transaction. REORGANIZE cannot be specified for a disabled index or an index with ALLOW_PAGE_LOCKS set to OFF.

Furthermore, the dynamic management view sys.dm_db_index_physical_stats would help you get statistical information pertaining to indexes on SQL Server 2005. The columns mentioned below in this DMV output could be helpful in analyzing the need for index reorganization.

avg_fragmentation_in_percent: Logical fragmentation for indexes, or extent fragmentation for heaps in the IN_ROW_DATA allocation unit. The value is measured as a percentage and takes into account multiple files. For definitions of logical and extent fragmentation, see Remarks. 0 for LOB_DATA and ROW_OVERFLOW_DATA allocation units.NULL for heaps when mode = SAMPLED.

fragment_count: Number of fragments in the leaf level of an IN_ROW_DATA allocation unit. For more information about fragments, see Remarks.NULL for nonleaf levels of an index, and LOB_DATA or ROW_OVERFLOW_DATA allocation units. NULL for heaps when mode = SAMPLED.

avg_fragment_size_in_pages: Average number of pages in one fragment in the leaf level of an IN_ROW_DATA allocation unit. NULL for nonleaf levels of an index, and LOB_DATA or ROW_OVERFLOW_DATA allocation units. NULL for heaps when mode = SAMPLED.

The following article could be helpful:Microsoft SQL Server 2000 Index Defragmentation Best Practices

http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx

Database Maintenance Plans

Posted on December 30, 2009 by Amit Banerjee

Database Maintenance Plans for SQL Server 2000
Database Maintenance Plans offer the following features in SQL Server 2000:
1. Backing up databases
2. Reorganization of indexes and data pages
3. Statistics Update
4. Shrinking the database
5. Database integrity checks
The above tasks of the maintenance plans would have associated with respective SQL Agent jobs and also these could be scheduled to suit your needs. Furthermore, these jobs could be created manually also without the help of maintenance plans.

Database Maintenance Plans for SQL Server 2005
In addition to the above mentioned features, SQL Server 2005 Enterprise Edition provides the option of rebuilding the indexes as an online operation. This makes it possible for the indexes to remain online while a rebuild index task is being performed on the database.
ONLINE = { ON OFF }
Specifies whether underlying tables and associated indexes are available for queries and data modification during the index operation. The default is OFF.
Note: Online index operations are available only in SQL Server 2005 Enterprise Edition.
ON
Long-term table locks are not held for the duration of the index operation. During the main phase of the index operation, only an Intent Share (IS) lock is held on the source table. This enables queries or updates to the underlying table and indexes to continue. At the start of the operation, a Shared (S) lock is held on the source object for a very short amount of time. At the end of the operation, for a short time, an S (Shared) lock is acquired on the source if a nonclustered index is being created; or an SCH-M (Schema Modification) lock is acquired when a clustered index is created or dropped online and when a clustered or nonclustered index is being rebuilt. ONLINE cannot be set to ON when an index is being created on a local temporary table.
OFF
Table locks are applied for the duration of the index operation. An offline index operation that creates, rebuilds, or drops a clustered index, or rebuilds or drops a nonclustered index, acquires a Schema modification (Sch-M) lock on the table. This prevents all user access to the underlying table for the duration of the operation. An offline index operation that creates a nonclustered index acquires a Shared (S) lock on the table. This prevents updates to the underlying table but allows read operations, such as SELECT statements.

The frequency of these tasks and which tasks are needed need to be performed need to be decided on the following factors:
1. Importance of the database in question
2. Size of the database
3. Maintenance window frequency
4. Load on the server
5. Whether it is an OLTP or OLAP database or both
6. The kind of transactions that take place on the database (Point 5 and 6 would decide how often index reorganization/index rebuild/statistics updates are required and whether a shrink database/file is required)