SQL Saturday #116: Are you gonna be there?


I am really looking forward to SQL Saturday 116 which is the first SQL Saturday being organized in Bangalore at the Microsoft Office on January 7th, 2012. There are some interesting stuff lined up for the day with two different tracks: Main and Practical. You can look up the schedule here.

Not only do you have an amazingly line-up of speakers but there’s also going to be a SQL Server Clinic. Those of you who have attended the SQL Pass Summit might be familiar with this concept where you have MSFT CSS Engineers having deep technical expertise in SQL Server available to you to discuss your real-world SQL problems that you face in your environments that you manage or work on. We love the challenge of new problems that we have not seen before and derive immense amount of satisfaction in helping someone resolve a tough or difficult issue involving SQL Server! If you have a specific problem to solve, it helps to bring the details. ERRORLOG files, error messages, specific query syntax, or details of your environment. The more background context you provide us with, the more we can narrow down with our suggestions. No presentations, just some serious troubleshooting and on-the-spot assistance from 2:30PM IST – 5PM IST under the Practical track.

I am glad that I will have the opportunity of presenting at SQL Saturday #116 along with the likes of Pinal Dave [Blog | Twitter], Vinod Kumar [Blog | Twitter] and Balmukund Lakhani [Blog | Twitter]. My presentation topic reads “Demsytifying SQL Server Memory Dumps” at 11:30AM IST. As the topic reads, I am going to clarify why digging into the root causes which result in SQL Server memory dumps might not be a daunting task!

I am totally looking forward to Jan 7th!! That would be a cracker of way to start off the first weekend on the New Year! Are you gonna be there?

Sysdatabases is empty–You say Whaaat?


While trying to backup a SQL Server database using SQL Writer (snapshot backups using VSS framework), then you might notice the following error in the application event logs.

Log Name: Application
Source: SQLWRITER
Event ID: 24581
Task Category: None
Level: Error
Description:
Sqllib error: System table sys.sysdatabases in SQL Server instance <SQL Server Name> is empty.

The above error is a bit misleading because it doesn’t literally mean that all your databases on the SQL Server instance reported have disappeared. If you scour the application event logs, you will find that another error is reported along with this error during the same timeframe.

Log Name: Application
Source: SQLWRITER
Event ID: 24583
Level: Error
Description:
Sqllib error: OLEDB Error encountered calling IDBInitialize::Initialize. hr = 0x80040e4d. SQLSTATE: 28000, Native Error: 18456
Error state: 1, Severity: 14
Source: Microsoft SQL Server Native Client 10.0
Error message: Login failed for user ‘NT AUTHORITY\SYSTEM’.
DBPROP_INIT_DATASOURCE: <SQL Server Name>
DBPROP_INIT_CATALOG: master
DBPROP_AUTH_INTEGRATED: SSPI

Now it is clear that the SYSTEM account is unable to log into the SQL Server instance mentioned in the error message above. If you look into the SQL Errorlog, you will find the following error message:

Error: 18456, Severity: 14, State: 11.
Login failed for user ‘NT AUTHORITY\SYSTEM’. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors. [CLIENT: <local machine>]

State 11 is basically telling you that the account doesn’t have access to the SQL Server instance. The reason for the login failure is available on SQL Server 2008 instances and above. There is a catch here the Login Failed error message is only reported in the SQL Errorlogs and Application Eventlogs only if the System account is granted access to the SQL Server instance but not granted the right permissions to get the database list from the sys.databases catalog. The hex code 0x80040e4d corresponds to an authentication failure.

So why does this happen? When you initiate a backup of a SQL Server database through SQLWriter (VSS Framework), SQLWriter will try to connect to all online instances to build a list of files associated with each database. SQLWriter/VSS needs this information to create the exclusion list. The account that the SQLWriter service uses is the NT AUTHORITY\SYSTEM account. So if you have multiple instances of SQL Server online on the same server, then the SYSTEM account needs to have SYSADMIN permissions OR the necessary permissions to run a query against the sys.databases catalog on the instance. This is a by-design requirement and documented in the KB Article mentioned below. However, if the SQL Server instance is not started, then you SQLWriter is not bothered about the instance(s) as I/O need not be frozen for database files for an instance which is shutdown.

919023    SQL Server 2005 connectivity and Volume Shadow Copy Service (VSS)
http://support.microsoft.com/default.aspx?scid=kb;EN-US;919023

Excerpt from above article:

Additionally, because of the types of operations that the writer must perform, we recommend that you do not remove the NT AUTHORITY\SYSTEM login from the sysadmin server role.

Now if you do not want to grant sysadmin permission to the System account, then you need to do the following to prevent the above error:

1. Grant db_datareader role to the SYSTEM account on the master database. This is required for other queries that the SQLWriter may need to executed during the course of the backup.
2. Grant ALTER ANY DATABASE or VIEW ANY DATABASE server-level permission, or CREATE DATABASE permission in the master database to the SYSTEM account. These are the minimum permissions required to query sys.databases table.

Normally the SYSTEM account is part of Built-in Administrators group on the box if it is not removed due to security hardening. You would need to grant the above permissions to the System account on all SQL Server instances that are installed and online on the server where you are trying to perform the VSS backup.

Another symptom of the problem manifests itself when you execute the command: vssadmin list writers. You will find that the above error message(s) are reported and the SQLWriter is not listed in the list of available writers.

One you have granted the SYSTEM account the necessary permissions, a quick way to verify that everything is working as expected is to run the vssadmin list writers command again. You might get the following output:

Writer name: ‘SqlServerWriter’
Writer Id: <GUID>
Writer Instance Id: <GUID>
State: [1] Stable
Last error: Non-retryable error

If you find that a non-retryable error is reported, then you need to restart the SQL Server VSS Writer service. Once this is done, run the command again and if there are no problems encountered, then you will get the following output for the SQLWriter.

Writer name: ‘SqlServerWriter’
Writer Id: <GUID>
Writer Instance Id: <GUID>
State: [1] Stable
Last error: No error

NOTE: If a SQL Server VSS (Snapshot) Backup is in progress for any instance on the server, then a restart of the SQL Server VSS Writer service should be deferred till the backup is completed or fails.

Easing in the trace FILTER for SQLDIAG


Now that the Diag Manager is available online and you can use configured custom PSSDIAG/SQLDIAG configuration files to collect data from SQL Server instances, you might want to configure your data collection packages in such a manner that profiler traces are setup correctly with trace filters if needed.

Note: Trace filtering can dramatically reduce trace (.TRC file size) and the I/O cost of tracing, but you should be aware that it can actually increase the CPU burden of tracing. To minimize the extra CPU use, filtering should be performed on an integer column (dbid, duration, etc.) instead of a text column (database name, textdata, etc) whenever possible. If a filter doesn’t remove a significant portion of the trace events (example >10%), it probably isn’t worth it, and might actually introduce more overhead than it prevents. While configuring PSSDIAG/SQLDIAG for SQL Server, you cannot add Profiler Trace Filters. Even if you do so from the GUI, it would not be included in the configuration file so that the filters are honored when  the server side trace starts up.

imageTo set filters for profiler traces collected with PSSDIAG/SQLDIAG, you need to:

1. Initialize PSSDIAG/SQLDIAG on the server. By this I mean start the PSSDIAG/SQLDIAG.
2. Find out the Trace ID of the profiler trace running using fn_trace_getinfo function or sys.traces view.
3. Use the Trace ID obtained from the above step, and use the sp_trace_setfilter stored procedure to set the filter. Refer "SQL Profiler Data Columns" under SQL Server Books Online for the Data Column numbers and "sp_trace_setfilter" topic for finding out the values of the logical and comparison operators.
4. To verify that the filter is active, use the fn_trace_filterinfo function. 

I shall demonstrate in this blog post how this works. As you can see in the see in the screenshot that the TraceID 2 is what I want to customize. Now that I have trace id, I will stop the profiler trace using sp_trace_setstatus stored procedure. Status value 0 will stop the trace.

Once I stop the trace, I use the sp_trace_setfilter function with the appropriate Data Column ID for setting a filter on SPID 52 to set a filter on SPID 52. Then I start the trace again. When I look into the profiler trace, I find that before the trace was stopped it was collecting data for all SPIDs but after setting the filter data is being collected ONLY for SPID 52 (see highlighted sections screenshot below).

imageOnce the filter is set. You can use the fn_trace_getfilterinfo function to verify that the trace filter is active.

If you are not using SQLDIAG to collect profiler traces, then it is as simple as configuring a server side trace. The only additional thing that you need to in this case is add the column filters while configuring the profiler trace. The above exercise is required only when you are collecting traces using SQLDIAG/PSSDIAG.

Reference:
How to: Filter Events in a Trace (SQL Server Profiler)
http://msdn.microsoft.com/en-us/library/ms175520.aspx

/*T-SQL commands used*/

select * from sys.traces — To get the trace id
EXEC sp_trace_setstatus 2,0 — Stops the trace but doesn’t delete the trace definition from the server
EXEC sp_trace_setfilter 2, 12, 0, 0, 52 — Add a filter for SPID = 52
EXEC sp_trace_setstatus 2,1 — Start the trace again
select * from fn_trace_getfilterinfo(2) — Get information about the filters set for the trace

image

Hope the above information helps in setting up trace filter for a profiler trace collected using SQLDIAG.