What does cntr_type mean?

Have you ever wondered if the cntr_type column value in the sys.sysperfinfo or sys.dm_os_performance_counters output has a significant meaning or not. Well since the column value is there, it obviously has a meaning. Additionally, if the raw values represented by the output of some of the counter types is considered at face value, then your performance base lining can take a severe beating.

Each cntr_type value meaning can be found from the WMI Performance Counter Type or Windows Server Performance Counter Type documentation on MSDN. 

The common counter types in SQL Server are:
PERF_COUNTER_RAWCOUNT | Decimal | 65536
Raw counter value that does not require calculations, and represents one sample.

PERF_COUNTER_LARGE_RAWCOUNT | Decimal | 65792
Same as PERF_COUNTER_RAWCOUNT, but a 64-bit representation for larger values.

PERF_COUNTER_COUNTER | Decimal | 272696320
Average number of operations completed during each second of the sample interval. NOTE: For "per-second counters", this value is cumulative. The rate value must be calculated by sampling the value at discrete time intervals. The difference between any two successive sample values is equal to the rate for the time interval used. For example, batch requests/sec is a per-second counter, it would show cumulative values.

PERF_COUNTER_BULK_COUNT | Decimal | 272696576
Average number of operations completed during each second of the sample interval. This counter type is the same as the PERF_COUNTER_COUNTER type, but it uses larger fields to accommodate larger values.

PERF_AVERAGE_BULK | Decimal | 1073874176 | Decimal | 537003264
Number of items processed, on average, during an operation. This counter type displays a ratio of the items processed (such as bytes sent) to the number of operations completed, and requires a base property with PERF_AVERAGE_BASE as the counter type.

PERF_LARGE_RAW_BASE | Decimal | 1073939712
Base value found in the calculation of PERF_RAW_FRACTION, 64 bits.

Example:
If you had the following values:
SQLServer:Plan Cache | Cache Hit Ratio | Temporary Tables & Table Variables | 381
SQLServer:Plan Cache | Cache Hit Ratio Base | Temporary Tables & Table Variables | 386
Then the Temp Table/Variable cache hit ratio percentage would be: 98.7% (approx.)

You can use the query below to get the comments for each counter type as discussed above:

select object_name,counter_name,instance_name,cntr_value,
case cntr_type 
	when 65792 then 'Absolute Meaning' 
	when 65536 then 'Absolute Meaning' 
	when 272696576 then 'Per Second counter and is Cumulative in Nature'
	when 1073874176 then 'Bulk Counter. To get correct value, this value needs to be divided by Base Counter value'
	when 537003264 then 'Bulk Counter. To get correct value, this value needs to be divided by Base Counter value' 
end as counter_comments
from sys.dm_os_performance_counters
where cntr_type not in (1073939712)

 

Documentation on MSDN:

WMI Performance Counter Types

http://msdn.microsoft.com/en-us/library/aa394569(VS.85).aspx

SQL Server 2005 BOL Topic

sys.dm_os_performance_counters (Transact-SQL) 

The broad classes of counters are as follows:

Non-computational Counter Types

http://msdn.microsoft.com/en-us/library/aa392713(VS.85).aspx

Basic Algorithm Counter Types

http://msdn.microsoft.com/en-us/library/aa384813(VS.85).aspx

Counter Algorithm Counter Types

http://msdn.microsoft.com/en-us/library/aa389384(VS.85).aspx

Timer Algorithm Counter Types

http://msdn.microsoft.com/en-us/library/aa393909(VS.85).aspx

Precision Timer Algorithm Counter Types

http://msdn.microsoft.com/en-us/library/aa392755(VS.85).aspx

Queue-length Algorithm Counter Types

http://msdn.microsoft.com/en-us/library/aa392905(VS.85).aspx

Base Counter Types

http://msdn.microsoft.com/en-us/library/aa384811(VS.85).aspx

Statistical Counter Types

http://msdn.microsoft.com/en-us/library/aa393663(VS.85).aspx

Setup – RPC, please be nice to me

Recently, I had looked into a SQL Server 2008 cluster setup failure where the following information was noted in the Detail.txt file:

2011-02-21 11:58:37 Slp: Configuration action failed for feature SQL_Engine_Core_Inst during timing ConfigRC and scenario ConfigRC.
2011-02-21 11:58:37 Slp: The RPC server is unavailable
2011-02-21 11:58:37 Slp: Configuration action failed for feature SQL_Engine_Core_Inst during timing ConfigRC and scenario ConfigRC.
2011-02-21 11:58:37 Slp: System.ComponentModel.Win32Exception: The RPC server is unavailable
2011-02-21 11:58:37 Slp:    at Microsoft.SqlServer.Configuration.Cluster.ClusterResource.UpgradeResourceDLL(String nodeName, String dllPathName)
2011-02-21 11:58:37 Slp:    at Microsoft.SqlServer.Configuration.SqlEngine.SQLEngineClusterFeature.UpgradeResourceDLL(SQLServiceResource sqlResource)
2011-02-21 11:58:37 Slp:    at Microsoft.SqlServer.Configuration.SqlEngine.SQLEngineClusterFeature.ConfigureSQLEngineResourceType()
2011-02-21 11:58:37 Slp:    at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineSetupPrivate.Patch_ConfigRC(EffectiveProperties properties)
2011-02-21 11:58:37 Slp:    at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineSetupPrivate.Patch(ConfigActionTiming timing, Dictionary`2 actionData, PublicConfigurationBase spcb)
2011-02-21 11:58:37 Slp:    at Microsoft.SqlServer.Configuration.SqlConfigBase.SqlFeatureConfigBase.Execute(ConfigActionScenario scenario, ConfigActionTiming timing, Dictionary`2 actionData, PublicConfigurationBase spcbCurrent)
2011-02-21 11:58:37 Slp:    at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.ExecuteAction(String actionId)
2011-02-21 11:58:37 Slp:    at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.Execute(String actionId, TextWriter errorStream)
2011-02-21 11:58:37 Slp: Exception: System.ComponentModel.Win32Exception.
2011-02-21 11:58:37 Slp: Source: Microsoft.SqlServer.Configuration.Cluster.
2011-02-21 11:58:37 Slp: Message: The RPC server is unavailable.

The RPC Server is unavailable is a pretty generic error and can happen due to a lot of reasons. One of the common reasons that we have seen in CSS while troubleshooting setup issues is with incorrect DNS entries. This can happen if DNS entry for the cluster network name wasn’t configured properly and is pointing to an incorrect IP Address.

Another point to keep in mind is Windows Firewall. Make sure that the Firewall configuration allows this RPC call through.

I worked with Shahryar (Twitter) on a similar issue last week and it was identified that a PING request to the cluster name or IP returned no response.

If all else fails, then network tracing would help in figuring out what is failing and where.

References:

Error message when you connect to a cluster virtual server by using the named pipes protocol: "The machines selected for remote communication is not available at this time."
http://support.microsoft.com/kb/306985/

Downloading that cumulative update or hotfix…

I recently answered a question on Twitter regarding the availability of Cumulative Update downloads for specific platforms.

Starting from SQL Server 2005, our Release Services team moved to an Incremental Servicing Model to deliver fixes for issues identified on current releases of SQL Server[s]. We now release a Cumulative Update package for a Service Pack periodically which contains all the fixes released since the RTM version of the Service Pack and hence the name “Cumulative Update”.

 image

 image

Now you click on the link mentioned in the picture below, you will get an option to download the packages associated for all the platforms.

image
I know this is quite trivial information but I have answered this question a few times and hence the post.

Troubleshooting that darn backup error

Backup and restore operations even though they are supposed to be a no-brainer, it can leave you wanting to tear your hair out. Considering the different elements that influence the outcome of a successful backup operation like backup media, consistency of the backup, network issues (if backing up to a UNC path), consistency of the database being backed up etc.

I shall attempt to outline a set of troubleshooting steps specifically for non-native backup/restore related errors for SQL Server databases.

Any BACKUP command that completes successfully or with errors has associated messages logged in the SQL Server Errorlog.

A successful backup from the SQL Server Errorlog:

Database backed up. Database: adventureworks, creation date(time): 2010/05/28(03:00:18), pages dumped: 17291, first LSN: 67:7733:56, last LSN: 67:7761:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {‘F:\adventureworks.bak’}).

You will see that the backup completion message has the database name, the number of pages that were contained in the backup, the date the backup was created and the number of devices and their type (in this case files). So you can reconstruct the sequence of backups even if your backup information history is missing from the MSDB system tables and you just have your SQL Server Errorlogs to play with. If you want to get real fancy about this, then you could spin up a nifty little Powershell or VBScript to parse through the SQL Errorlogs and provide the backup sequence to you as well.

Now, let’s look at a failed backup message from the SQL Errorlog.

Error: 18204, Severity: 16, State: 1.
BackupDiskFile::CreateMedia: Backup device ‘R:\adventureworks.bak’ failed to create.
Operating system error 3(The system cannot find the path specified.).
Error: 3041, Severity: 16, State: 1.
BACKUP failed to complete the command BACKUP DATABASE adventureworks.

You will notice above that the Operating System error code associated with the backup failure is reported in the error message. More often than not, the error message is self explanatory. If the error code doesn’t throw back an error text along with it, you can get the windows error code associated with the error code using net helpmsg <error number> from a command prompt window.

In the above case, there wasn’t any R: drive on my server. Now that I have finished stating the most obvious of troubleshooting methodologies for SQL native backups which was done to set the context for the next part of this post. The troubleshooting methodology for non-native backups.

Non-native SQL backups of databases use one of the following methods:

1. Use of APIs exposed through SQLVDI.dll to perform a VDI based backup
2. Use of VSS framework to perform a snapshot backup of the SQL Server database

Note: In this post, I shall not be addressing any storage level backup solutions that have options or features to handles SQL database backups.

This post is aimed at discussing the tackling of VSS/VDI related errors that you face while performing a database backup.

If a VDI backup fails, then you can try the following sequence of troubleshooting steps:

1. Attempt a backup to the same location using native SQL backup from a query window or SQLCMD. This might not always be possible as some of the backup tools do not allow any other backup application to connect to the backup share apart from the backup tool’s agents.
2. The next thing that you can do is to ensure that your SQLVDI.DLL is updated to the latest build. For this you can use the SQL Server Backup Simulator available on Code Gallery using the “Validate VDI Installation” option in the tool. This check in the tool will perform basic checks like current DLL version and additional checks based on root causes of common scenarios that CSS has seen in the past for VDI backup failures. Additional information on the usage of the tool is available here.
3. Perform a backup of the database using the tool for which the backup is failing (to the same destination if possible). The parameter tweaks at this point are not available in the current version like striped backups, changes to MAXTRANSFERSIZE/BUFFER COUNT etc. but if the backup from this tool is successful, then you know for a fact that the SQLVDI APIs are working as expected. Then it is either an additional piece of logic in the backup software that is failing or there is an external factors like network/backup media or there is a resource crunch on the SQL instance on which the database resides.


Common errors that you might see during a VDI backup operation failure:

Error message 1
BackupVirtualDeviceFile::ClearError: failure on backup device ‘VDI_ DeviceID ‘. Operating system error 995(The I/O operation has
been aborted because of either a thread exit or an application request.).

Error message 2
Error: 18210, Severity: 16, State: 1.
BackupMedium::ReportIoError: write failure on backup device ‘VDI_ DeviceID ‘. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.)

Error message 3
Error: 18210, Severity: 16, State: 1. 
BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device ‘VDI_ DeviceID. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.)

The above errors are mostly due to outdated versions of SQLVDI.DLL on SQL Server 2000 or SQL Server 2005 instances. The “Validate VDI Installation” option in the Backup Simulator should be able to identify any known issues with DLL version mismatch for you.


If you have an application that uses Volume Shadow Service to backup your SQL database and that backup fails, then you your troubleshooting steps would be a bit different. The tell-tale sign of a VSS Snapshot backup is in the I/O Frozen message being written into the SQL Errorlogs for the databases being backed up. In such a failure scenario, take a backup of the database using Windows Server Backup Admin (Windows Server 2008 and above) or NTBackup utility to perform a backup of the database files involved. If this backup is successful, it means that a snapshot backup is successfully completing for the database. Now depending on your backup application’s implementation logic, this might be a resource crunch or a backup storage media incompatibility or even a resource crunch.

Common errors associated with VSS backups:

Issue# 1
Error: 3041, Severity: 16, State: 1.
Backup      BACKUP failed to complete the command BACKUP DATABASE <database name>. Check the backup application log for detailed messages.
SubprocessMgr::EnqueueSubprocess: Limit on ‘Max worker threads’ reached.
This error message is raised when your backup application uses VSS to backup a large number of databases simultaneously. More information around this can be found in this post Volume Shadow barfs with 0x80040e14 code

Issue #2
While performing a Bare Metal backup you might run into issues if your SQL Server binaries are located on a different drive other than the C: drive. More information on the same is mentioned here: Location of SQL Binaries can flip out Bare Metal backups

Issue #3

Event Type:    Error
Event Source:    VSS
Event Category:    None
Event ID:    6005
Description:
Sqllib error: Database %1 is stored on multiple volumes, only some of which are being shadowed.

The above error occurs when you backup only one of the volumes that contains SQL database data/log files which are spread over multiple volumes. This is explained in detailed in this post: Why is VSS complaining about SQL Data files


More information on how VSS/VDI work is present in the blog post by Sudarshan: INFORMATIONAL- SHEDDING LIGHT on VSS & VDI Backups in SQL Server

Previous posts related to Volume Shadow Backups
IO Frozen messages while taking NT Backup for SQL databases

Previous posts related to VDI backups
SQL VDI backup fails with 0×80770007

Two starts, one recompile and a confusing place

I used to be regular viewer of the TV Series “Two Guys, A Girl and A Pizza Place” co-starring Ryan Reynolds. The blog post title is a play on that name. In the past week, I have heard about this phenomenon of “one ghost Stmt: Starting” event more than thrice which is what triggered this blog post.

When you are analyzing such a trace file using RML Utilities, you will find an entry in the tblStatements table showing a Statement for a particular Batch Sequence (Column name: BatchSeq) which has a Start Time but no End Time. Additionally the AttnSeq column for that row would have a NULL value indicating that the statement didn’t timeout nor was it cancelled due to a Server/Client side error.

So what is this ghost entry all about. If you manually dig through the profiler trace, you will find that that the Statement in question has a Recompile event associated with it. This is by-design. When a Stored Procedure statement encounters a Recompile event, it would have a SP: StmtStarting event associated with it but no corresponding SP: StmtCompleted event. This can lead to confusion during trace file analysis.

Here is an example illustrating the same. I have a Stored Procedure whose definition is as follows:

CREATE PROC usp_recompile
AS
SELECT * FROM tbl1 OPTION (RECOMPILE)

The Recompile hint is added to forcefully induce a Recompile for the Stored Procedure when it is executed. If you look into the profiler events captured below during the Stored Procedure execution, you will see that there is a SP:StmtStarting event followed by a SP:Recompile and a SP:StmtRecompile event. Then there is another SP:StmtStarting event indicating that the recompile operation is complete and the start of the execution of the SP statement.

image

Hope this helps clear up the confusion around this topic.