Restore operation with LiteSpeed fails with VDI error

While performing a backup/restore operation with LiteSpeed you get the following error:
 
SQLVDI: Loc=CVDS::Close. Desc=Abnormal termination state. ErrorCode=(0). Process=11812. Thread=15828. Client. Instance=PROD. VD=Global\VDI_759A94A8-F3A4-4689-AD2F-F1DDED76218B_0_SQLVDIMemoryName_0.
SQLVDI: Loc=SignalAbort. Desc=Client initiates abort. ErrorCode=(0). Process=11812. Thread=15828. Client. Instance=PROD. VD=Global\VDI_759A94A8-F3A4-4689-AD2F-F1DDED76218B_0_SQLVDIMemoryName_0.
 
When you use simple.exe to check if the SQLVDI.DLL is functioning correctly, you find that you can perform a backup and restore without any hassles. Now, you are left with a Catch-22 situation. Is it SQL Server or is it Litespeed? How do I get the backup to restore or the database backed up. If it is the latter, then you have another option, take a SQL Native Backup Wink.
 
So, the first clue is the highlighted portion in the above error message which is pointing to a Client Side VDI call stating that the Client initiated the Abort. Now why the client initiated the Abort depends on the how the code in the application is written to generate the Abort request.
 
If you enable logging on LiteSpeed, you might find something similar to the verbose output below:
 

8/15/2009 7:39:27 AM: RESTORE log [database name] FROM VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_0′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_1′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_2′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_3′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_4′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_5′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_6′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_7′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_8′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_9′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_10′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_11′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_12′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_13′, VIRTUAL_DEVICE=’VDI_9860C32A-718B-457A-8FC4-7F470843F99A_14′ with blocksize=65536, stats=5, maxtransfersize=1048576, buffercount=30, NORECOVERY, NOUNLOAD, STATS = 10
8/15/2009 7:39:27 AM: VDI::ConfigureVDI -> Read registry value for configuration timeout
8/15/2009 7:39:27 AM: VDI::ConfigureVDI -> Waiting 4294967s for backup/restore command to begin
8/15/2009 7:39:27 AM: hr=80004005

8/15/2009 7:39:27 AM: RESTORE LOG is terminating abnormally.
RESTORE cannot process database ‘<database name>’ because it is in use by this session. It is recommended that the master database be used when performing this operation.
8/15/2009 7:39:27 AM: SLS::ExecSQL -> Error: 12309, Msg: SQL Server has returned a failure message to LiteSpeed 2005 which has prevented the operation from succeeding.
The following message is not a LiteSpeed 2005 message. Please refer to SQL Server books online or Microsoft technical support for a solution:
8/15/2009 7:39:27 AM: SLS::ExecSQL -> End Execute Command
8/15/2009 7:39:27 AM: SLS::ExecSQL -> Wait for VDI config thread
8/15/2009 7:39:47 AM: SLS::ExecSQL -> Forced close the VDI config thread
8/15/2009 7:39:47 AM: SLS::ExecSQL -> Original Msg: RESTORE LOG is terminating abnormally.
RESTORE cannot process database ‘<database name>’ because it is in use by this session. It is recommended that the master database be used when performing this operation.
8/15/2009 7:39:47 AM: SLS::ExecSQL -> Error: 12300, Msg: RESTORE LOG is terminating abnormally.
RESTORE cannot process database ‘<database name>’ because it is in use by this session. It is recommended that the master database be used when performing this operation.
8/15/2009 7:39:47 AM: Error: 12300, Msg: SLS::ExecSQL -> user exception
8/15/2009 7:39:47 AM: Error: 12300, Msg: SLS::execSQL -> End execute with unsuccessfull return code

The above set of messages can be misleading at times. This is a known issue with LiteSpeed when they call the client side version GetConfiguration function. This error happens if any login logged into the SQL instance has their default database set to the database that is currently being restored. This problem will manifest even if the SQL Service Account’s default database is the database being restored. The problem doesn’t occur on LiteSpeed version 4.8 to version 5.1. The problem will occur ONLY if the restore is performed from the LiteSpeed GUI or SSMS (by using LiteSpeed XPROCs). This problem will occur for any kind of restore operation (Full, Differential OR Log).

One possible workaround is to extract the backup files from the Litespeed backup set using the LiteSpeed Extractor utility which would extract the backup file into Microsoft SQL native backup format. After that these backups can be restored using T-SQL commands.

A second workaround is to start the restore operation using the command line option of LiteSpeed.

 

How to retrieve data from Showplan XML for Query Compile Event

The Showplan XML for Query Compile profiler event in SQL Server can be very helpful when you see a lot of SPIDs in SQL Server 2005 waiting for Resource Semaphore Query Compile waits. 

RESOURCE_SEMAPHORE_QUERY_COMPILE waits: 

A large number of waits of type RESOURCE_SEMAPHORE_QUERY_COMPILE indicates a large number of concurrent compiles. In SQL Server 2005 we limit the number of compiles than can start at any given time to prevent the following scenario: a large number of compilations are initiated and memory partially allocated. A large percentage of queries land up being suspended (waiting) for resources to complete compilation, eventually timing out and releasing memory allocated to them. 

select * from sys.dm_exec_requests 

where wait_duration_ms <> 0 order by wait_duration_ms desc

If you see a large number of Session IDs waiting on this wait type, then it is time to identify which queries on your server are causing large amount of memory. In this post, I shall not talk in detail about what conditions lead to this kind of wait. However, some information can be found in this blog post by the SQL Dev team: 

http://blogs.msdn.com/sqlprogrammability/archive/2007/01/21/2-0-diagnosing-plan-cache-related-performance-problems-and-suggested-solutions.aspx 

You can also use this query to identify all the queries on the server instance waiting during Compilation: 

select sp.*, st.text from sys.sysprocesses sp CROSS APPLY sys.dm_exec_sql_text(sp.sql_handle) AS st WHERE sp.lastwaittype LIKE 'RESOURCE_SEMAPHORE_QUERY_COMPILE%' ORDER BY sp.waittime DESC;

In brief, the two conditions that can lead to this issue are: 

1. Multiple large ad-hoc queries being submitted (from the application) to the engine which leads to a compilation every time. 

2. A single large query being submitted to the engine which is throttling the execution of all other queries. 

You can use a server side profiler trace to collect a trace with the Showplan XML for Query Compile Event. 

Once you have the profiler trace with you, you need to load the profiler trace into a database table using the query below. 

select identity(int,1,1) as rownumber,* into xmltraceall from fn_trace_gettable ('<location of the trace file>',default where eventclass = 168

The Eventclass filter will only import the Showplan XML for Query Compile events into the database table. After that would need to run the query below to get the compile memory statistics for the queries running on your instance. 

select databasename,objectname,sum(convert(int,substring(textdata, charindex('compilememory', textdata)+len('CompileMemory="'), charindex ('"', textdata, charindex('compilememory', textdata)+len('CompileMemory="'))-(charindex('compilememory', textdata)+len('CompileMemory="'))))) as Total_Compiled_Memory, 

avg(convert(int,substring(textdata, charindex('compilememory', textdata)+len('CompileMemory="'),charindex ('"', textdata, charindex('compilememory', textdata)+len('CompileMemory="'))-(charindex('compilememory', textdata)+len('CompileMemory="'))))) as Avg_Compiled_Memory, 

max(convert(int,substring(textdata, charindex('compilememory', textdata)+len('CompileMemory="'),charindex ('"', textdata, charindex('compilememory', textdata)+len('CompileMemory="'))-(charindex('compilememory', textdata)+len('CompileMemory="'))))) as Max_Compiled_Memory, 

min(convert(int,substring(textdata, charindex('compilememory', textdata)+len('CompileMemory="'), charindex ('"', textdata, charindex('compilememory', textdata)+len('CompileMemory="'))-(charindex('compilememory', textdata)+len('CompileMemory="'))))) as Min_Compiled_Memory, 

count(*) as Counts 

from xmltraceall 

group by databasename,objectname 

order by 3 desc

Alternatively, on the server, you can use the query below to get details of the queries causing havoc on your SQL Server instance: 

WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS sp) Select stmt.stmt_details.value('(./sp:QueryPlan/@CompileMemory)[1]', 'bigint') as compile_memory_kb, 

stmt.stmt_details.value('(./sp:QueryPlan/@CompileTime)[1]', 'bigint') as compile_time_ms, 

stmt.stmt_details.value('(./sp:QueryPlan/@CompileCPU)[1]', 'bigint') as compile_cpu_ms, 

stmt.stmt_details.value('(./sp:QueryPlan/@CachedPlanSize)[1]', 'bigint') as CachePlanSize, 

p.objname, replace (replace (stmt.stmt_details.value('@StatementText', 'nvarchar(max)'), CHAR(10), ' '), CHAR(13), ' ') as query_text 

from (select cast (t.query_plan as Xml) showplan,t.dbid ,t.objectid, OBJECT_NAME (t.objectid, t.dbid) 'objname' from sys.dm_exec_cached_plans p1 cross apply sys.dm_exec_text_query_plan (plan_handle,DEFAULt,DEFAULT) t) as p 

cross apply showplan.nodes('//sp:StmtSimple') as stmt (stmt_details) 

order by 3 desc, 1 desc

However, please be advised that this is quite an expensive query and can cause additional performance issues on the server. I would suggest using the server side profiler trace method.

Desktop Heap Issues

Over the past year I have seen multiple servers falling prey to the desktop heap issue due to which CSS has received multiple support calls. And where do I fit into all this. Well SQL Server setup works in two phases, the second phase being service startup. When you have a large (don’t ask me what this number is because it varies based on multiple parameters) number of services running on your server, then the SQL service startup might fail due to which the setup would fail. That is when you would either curse the setup and give it another shot(s) or call CSS.

Read about what the Platforms Escalation team’s blog post about what Desktop Heap is all about:

http://blogs.msdn.com/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx 

This is when we collect DHeapMon output to analyze the current status of the Desktop Heap. Now, since you have a large number of windows services running on your server, you either have an option of shutting down the non-essential services or follow the steps mentioned in the article below (KB126962).

WORD OF CAUTION: The default values shouldn’t be changed unless Microsoft Support has asked you to do so.

If the total desktop heap becomes the system-wide buffer size, the following event is logged:

Event Type: Warning
Event Category: None
Event ID: 244
Date: Date
Time: Time
User: N/A

Description: Failed to create a desktop due to desktop heap exhaustion.

From the system event logs, you would see the following message:

Failed to create a desktop due to desktop heap exhaustion.

Timeout (30000 milliseconds) waiting for the <SQL Server name> service to connect.
The <SQL Server name> service failed to start due to the following error: The service did not respond to the start or control request in a timely fashion.
The Windows Installer service entered the stopped state.

System      07/08/2009 14:07:07 Warning            244    Win32k                   User information unavailable…    Failed to create a desktop due to desktop heap exhaustion.
System      07/08/2009 14:07:07 Error              7009   Service Control Manager  User information unavailable…    Timeout (30000 milliseconds) waiting for the <SQL Server name> service to connect.
System      07/08/2009 14:07:07 Error              7000   Service Control Manager  User information unavailable…    The SQL Server (<instance name>) service failed to start due to the following error: The service did not respond to the start or control request in a timely fashion.

The above messages would be noticed in the Windows Event Logs around the same time when the SQL setup failed if you are not able to install a SQL patch due to a desktop heap exhaustion issue.

Useful Link

"Out of Memory" error message appears when you have a large number of programs running

http://support.microsoft.com/kb/126962/

184802    User32.dll or Kernel32.dll fails to initialize
http://support.microsoft.com/default.aspx?scid=kb;EN-US;184802

How to search for malicious strings in your database

In the past, CSS had a number of support calls regarding SQL Injection. Nowadays, people are much more aware of how to secure their front-end and back-end systems so that they do not fall prey to malicious attackers and become victims of issues like SQL Injection. Now let’s take a scenario where you have identified a malicious string in your database table. You are not sure how many more table(s)/column(s) can have this kind of malicious string. Searching this would be a daunting task. You can use the script below to run a search on all tables columns that have character data and look for the malicious string pattern that you provide the @string variable. This search would be a very performance intensive search as this would take a long time to execute as it is combing through all character columns in the entire database. So, here’s the warning that you have heard multiple times 🙂 : DO NOT RUN THIS DURING PRODUCTION HOURS. However, if you have a case of SQL Injection, then production should definitely be halted till you have plugged the hole in the security. 

set nocount on
DECLARE @T varchar(255), @C varchar(255),@string varchar(255);
SET @string = '<script' -- Malicious value to be searched for
DECLARE Table_Cursor CURSOR FOR
SELECT a.name, b.name
FROM sysobjects a, syscolumns b
WHERE a.id = b.id AND a.xtype = 'u' AND
(b.xtype = 99 OR --ntext
b.xtype = 35 OR -- text
b.xtype = 231 OR -- nvarchar
b.xtype = 175 OR -- char
b.xtype = 167) -- varchar
declare @sql varchar(8000)
OPEN Table_Cursor;
FETCH NEXT FROM Table_Cursor INTO @T, @C
WHILE (@@FETCH_STATUS = 0)
BEGIN
  select 'Table Name:'+@T
  EXEC('select * from [' + @T + '] where [' + @C + '] ' + 'like  ''%'+@string+'%'' OR ' + @C + ' like ''%EXEC %'''  )
  FETCH NEXT FROM Table_Cursor INTO @T, @C;
END
CLOSE Table_Cursor;
DEALLOCATE Table_Cursor;
set nocount off

PAGE_VERIFY option in SQL Server

A new feature is available in SQL Server 2005 and higher versions to enabled Page Checksum for a database page to help detect errors in the IO path.

The SQL Server storage engine has already blogged about this:

http://blogs.msdn.com/sqlserverstorageengine/archive/2006/06/29/Enabling-CHECKSUM-in-SQL2005.aspx 

The reason I’m posting an additional article is the misconceptions related enabling Checksum for a SQL Server database. You can find out what is the PAGE VERIFY option for your database using the system catalog sys.databases with the help of the query below:

select database_id, page_verify_option, page_verify_option_desc

from sys.databases

The PAGE VERIFY option has 3 possible values:

0 = NONE

1 = TORN_PAGE_DETECTION

2 = CHECKSUM

Let’s take an example here:

You created a database without CHECKSUM enabled. So, all pages that were written to disk now do not have the Checksum property was enabled on it. After you enable Checksum on the database all the objects that were created prior to enabling Checksum on the database will not have the Checksum written into the Page header. When these pages are brought from the disk into the SQL Server Buffer Pool (RAM) and the dirty bit of the page is set (when any DML operation modifies the page), only then would the Checksum get calculated and written into the page header before it gets written back to the physical media (disk). This happens during the Checkpoint operation. So, if you brought a page into the SQL Server Buffer Pool and didn’t dirty the page (i.e. only for a SELECT operation), then the checksum will not be written onto the page header.

One of the recommendations would be to dirty all pages using operations like DBCC DBREINDEX so that all pages that didn’t have Checksums written into the header now have Checksums written onto it after enabling Checksum on a database.

Why is this by-design?

The reason for this is we cannot estimate the size of the database and the number of database pages that would need checksums calculated and written onto the disk. This would involve a huge performance degradation which is avoided by the current design.

From SQL Server Books Online:

PAGE_VERIFY

When CHECKSUM is specified, the Database Engine calculates a checksum over the contents of the whole page and stores the value in the page header when a page is written to disk. When the page is read from disk, the checksum is recomputed and compared to the checksum value stored in the page header.

When TORN_PAGE_DETECTION is specified, a specific bit for each 512-byte sector in the 8-kilobyte (KB) database page is saved and stored in the database page header when the page is written to disk. When the page is read from disk, the torn bits stored in the page header are compared to the actual page sector information.

When NONE is specified, database page writes will not generate a CHECKSUM or TORN_PAGE_DETECTION value. SQL Server will not verify a checksum or torn page during a read even if a CHECKSUM or TORN_PAGE_DETECTION value is present in the page header.

Another interesting post by Bob Dorr (Principal Escalation Engineer – SQL Server):

http://blogs.msdn.com/psssql/archive/2008/05/22/how-it-works-sql-server-2005-possible-data-recovery-when-checksum-failure-reported.aspx