Canonical ACLs: You said wwwhhat?

A while back I had to debug the SQL Server 2008 setup code to find out why the Registry Consistency Check rule was taking an obnoxiously long time to complete on a particular server. While trying to apply SQL Server 2008 SP2, you will find that the Registry Consistency Check rule is executed. Now the setup UI didn’t show any progress for over an hour. When I checked a previous set of setup logs ,I found the following failure (from the Detail.txt file):

2011-04-17 15:06:53 Slp: d:14c36b1c9c26b2820a8\x64\FixSqlRegistryKey_x86.exe exit code: 1
2011-04-17 15:06:53 Slp: Rule ‘AclPermissionsFacet’ detection result: AreSqlAclPermissionsCanonical= False
2011-04-17 15:06:53 Slp: Evaluating rule : AclPermissionsFacet
2011-04-17 15:06:53 Slp: Rule running on machine: <machine name>
2011-04-17 15:06:53 Slp: Rule evaluation done : Failed
2011-04-17 15:06:53 Slp: Rule evaluation message: The SQL Server registry keys from a prior installation cannot be modified. To continue, see SQL Server Setup documentation about how to fix registry keys.

In another setup attempt (again from Detail.txt file), I found that the FixSqlRegistryKey_x86.exe did not log any message post the launch of the EXE:

2011-04-29 11:42:58 Slp: D:\Microsoft SQL 2008 SP2\x64\FixSqlRegistryKey_x64.exe exit code: 0
2011-04-29 11:42:58 Slp: Launching external tool: D:\Microsoft SQL 2008 SP2\x64\FixSqlRegistryKey_x86.exe
2011-04-29 11:42:58 Slp

If you capture a Process Monitor trace, you will find that the FixSqlRegistryKey_*.exe is traversing the registry keys pertaining to the SQL Server registry hive. The EXE (FixSqlRegistryKey_*.exe) looks up and fixes all non-canonical ACLs with the registry key and its child keys for the two hives below:

HKLM\Software\Microsoft\Microsoft SQL Server\
HKLM\Software\Microsoft\MSSQLServer\

Did I just read the word Canonical?!? Yes, you did. A canonical ACL must have ACEs sorted according to the following order:

1. Access-denied on the object
2. Access-denied on a child or property
3. Access-allowed on the object
4. Access-allowed on a child or property
5. All inherited ACEs

Reference: http://blogs.msdn.com/b/patcarna/archive/2009/02/09/the-access-control-list-is-not-canonical.aspx

If SQL Setup finds ACLs in non-canonical form, then it will try and fix the ACLs for the registry key and it’s sub-keys which have the ACLs in non-canonical form. If there are permission issues encountered while traversing the registry, then this might take a longer time to error out or fail with the error message mentioned above.

You have the following option to correct the non-canonical ACLs:

1. Locate non-canonical ACLs in two hives mentioned above and fix them manually.
2. Re-run the rules
3. Check which registry key the EXE takes a long time to move forward from or encounters an access denied error using a Process Monitor trace.

Note: Always use a backing file pointed to a local disk when capturing Process Monitor traces for prolonged periods. Refer this post for more details.

Once you have fixed the non-canonical ACLs, you should be able to successfully get beyond the registry consistency rule check without having to get lunch and dinner in between!

Since I didn’t want to sit and traverse through all the registry keys manually, I wrote a Powershell script to check if the ACLs for your SQL Server registry hive and it’s sub-keys are in canonical form or not and report back the key which is not in non-canonical form.


Function fnCheckCanonical
{
Param ([string] $regvalue)

$reg = Get-ChildItem $regvalue -recurse
Write-Host "Checking hive : " $regvalue
foreach ($path in $reg)
{

$regkey = $path.ToString()
$regkey = $regkey.Replace("HKEY_LOCAL_MACHINE","HKLM:")
#Write-Host $regkey

$acl = Get-Acl -Path $regkey
if ($acl.AreAccessRulesCanonical.ToString().Equals("True"))
{
Write-Host "Verified key: " + $acl.Path.ToString()
}
else
{
Write-Host "Reg Key: " $acl.Path.ToString()
Write-Host "IsCanoncial: " $acl.AreAccessRulesCanonical.ToString()
Write-Host "Failed canonical check"
}

}

}

fnCheckCanonical HKLM:"Software\Microsoft\Microsoft SQL Server"
fnCheckCanonical HKLM:"Software\Microsoft\MSSQLServer"
fnCheckCanonical HKLM:"SOFTWARE\Wow6432Node\Microsoft\Microsoft SQL Server"
fnCheckCanonical HKLM:"SOFTWARE\Wow6432Node\Microsoft\MSSQLServer"

ACLs don’t change into non-canonical forms just because they are having a bad day but it could happen when you use scripting tasks from .NET/VBScript/Powershell or subinacl.exe to incorrectly modify/add/remove ACLs on a registry key. Digging into the root cause of how they actually got into the state is as good as searching for a needle in all the haystacks in the world! Here is an article on how subinacl.exe could arrange ACLs incorrectly on Windows 2000: http://support.microsoft.com/kb/296865

Hope this helps you in fixing your non-canonical ACLs if you are unfortunate enough to run into this issue.

del.icio.us Tags: ,,
Advertisement

Statistics Auto Update may not be automatically the best choice

I had written a post a while back stating how DBREINDEX and ALTER INDEX treat the statistics update differently for user and auto-created statistics on a database table. In this post, I shall talk about how auto-update statistics which normally is left ON on most databases that I have worked on can sometimes choose to play “spoil sport” for the all important cardinality estimation during the plan generation phase. The auto-update statistics option of a database affects all table statistics: index, user created and auto-created.

As usual, my favorite way to illustrating a SQL Server behavior is to use an example. I created a table with two integer columns and inserted a million rows in the table. There are two indexes on the table: a clustered index on the primary key (column a) and a non-clustered index on the second column (column b). The DBCC SHOW_STATISTICS output for the index on column b has the following output:

Name         Updated              Rows                 Rows Sampled
idx_b        Jul  8 2011  6:34AM  1000000              482999              

If you look at the data above, you will notice that the rows and rows sampled have a difference which means that the statistics were updated with a less than a 100% sampling. There are only 3 distinct values for this particular column. The following query returned 26,000 rows but the estimated rows was way off.

select a,b from dbo.tblStats where b = 4

Plan snippet:

Rows      EstimateRows
26000     2952.387

Now when I updated the statistics with a fullscan on the table and generated a new plan for the same query and…. voila…. I have the correct estimate rows and the histogram also shows me the correct data distribution. In my scenario, I had 400,000 records in the table after which I ran a WHILE loop to insert 600,000 additional records in the table. Till then the statistics were updated with a fullscan. After that an auto-update statistics messed up the sampling rate. This is because when auto update statistics kicks in it uses a very small percent as the default sampling rate used is a function of table size. This means that larger the table, the lower the sampling percentage will be. Now this is not necessarily a bad thing. Eg. If an auto stats operation kicks in during a query execution and auto update stats asynchronously is not set for the database, you don’t want the statistics update operation to sample the entire table, update the statistics histogram and then execute the query. So the default sampling functionality allows for quick statistics gathering even for large tables. The default sampling is used even when a CREATE STATISTICS command is executed on the table.

Now why is this important to be aware when designing your statistics update strategy for VLDBs or OLTP environments. The queries in production environments can involve JOIN operations and other complex constructs which means that the cardinality estimation of the rows to be fetched will determine which operator is to be used. Over and under estimating can prove detrimental to the performance of query plan.

Moral of the story: Statistics for a table need to be updated with a fullscan if there have been a significant number of changes made to the table data.

The frequency of statistics update would have to be determined based on the number of updates that your frequently used tables receive and when the instance has some free time to handle the statistics update operations. This is the famous “It Depends” answer. Even though it is an online, it does consume resources and can potentially cause blocking when the scan is being done on the table to figure out the data distribution. It is not always required to rebuild indexes to update statistics. Read the Books Online article “Using Statistics to Improve Query Performance” for multiple options like Asynchronous Statistics Update, NoRecompute option etc. and how they can be pertinent to your situation in case you have large databases where statistics update cannot be done very frequently.

Do not go by the row modification counter value starting from SQL Server 2005 as the statistics updates are handled based on column modification counters than row modifications.

There are edge-case scenarios where a 100% sampling might not be best choice because of a skew in data distribution, but that is a story for another day!

My colleague Jack wrote some posts on interesting issues that he ran into with data distribution in table and the statistics sampling:
Parallel Index Creation performance issue
http://blogs.msdn.com/b/psssql/archive/2009/03/18/be-aware-of-parallel-index-creation-performance-issues.aspx
Why does this query consumes so much CPU?
http://blogs.msdn.com/b/psssql/archive/2011/04/29/why-does-this-query-consumes-so-much-cpu.aspx
Sampling can produce less accurate statistics if the data is not evenly distributed
http://blogs.msdn.com/b/psssql/archive/2010/07/09/sampling-can-produce-less-accurate-statistics-if-the-data-is-not-evenly-distributed.aspx

Reference:
Statistics used by the Query Optimizer in SQL Server 2005
http://technet.microsoft.com/en-us/library/cc966419.aspx