How to find who is using/eating up the Virtual Address space on your SQL Server

Posted on February 16, 2010 by Sudarshan Narasimhan

Well, this is often a tricky situation, where you are running into Virtual Address Fragmentation and getting OOM (out-of-memory) errors in your SQL Server. Most often, people do not understand the difference between physical memory pressure vs. virtual memory pressure. Adding more RAM is definitely not a solution here! Here are some sample error messages you might have seen in your SQL Errorlog which will indicate if this is physical memory or virtual memory issue,

SQL 2000
WARNING: Failed to reserve contiguous memory of Size= 65536.
SQL Server could not spawn process_loginread thread.

SQL 2005 and above
Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE 122880

Error: 701, Severity: 17, State: 123.
There is insufficient system memory to run this query.

As the errors above indicate, the problem is in reserving a fixed size in the virtual address space of SQL Server. Note: the size indicated above in in bytes. e.g. 65536/1024 = 64 KB

Typical symptoms you would notice in these situations are :-

1) Database or Log Backups might start failing
2) You are unable to make a new connections to SQL.
3) Certain jobs which require memory from non-BPool region will fail. Many a time, these problems go away automatically without any action taken. This indicates an intermittent problem where at a certain point in time, there was virtual memory pressure which resulted in above messages being printed to the SQL Errorlog. Dealing with VAS fragmentation or running out of VAS on SQL server 2000 was rather painful and required setting up additional debugging techniques to get down to the bottom of the issue. You might have used some of the following tools :-

1) VMSTAT
2) Debugging Tools for Windows (to capture a manual dump of sqlservr.exe)
3) T2551 to generate a filtered dump in SQL Server when running into a OOM condition.
4) TLIST.exe to identify modules loaded in SQL Server. Luckily, starting with SQL 2005 there is a in-memory DMV which tracks the virtual address space (VAS) of your SQL Server process.

Here are some queries which will help you find out how much virtual address is available on sqlservr.exe which is FREE and how much is total available (Free+InUse)

1. Will tell you the size of the biggest contiguous block in VAS

SELECT convert(varchar,getdate(),120) as [Timestamp], max(region_size_in_bytes)/1024 [Total max contiguous block size in KB]  from sys.dm_os_virtual_address_dump where region_state = 0x00010000 --- MEM_FREE

2. Will also tell us size of largest contiguous block plus the region marked as MEM_RESERVE (this is your non-BPool area reserved during SQL Startup, sometimes referred to as MTL – MemToLeave)

With VASummary(Size,Reserved,Free) AS
(SELECT
    Size = VaDump.Size,
    Reserved =  SUM(CASE(CONVERT(INT, VaDump.Base)^0)
    WHEN 0 THEN 0 ELSE 1 END),
    Free = SUM(CASE(CONVERT(INT, VaDump.Base)^0)
    WHEN 0 THEN 1 ELSE 0 END)
FROM
(
    SELECT  CONVERT(VARBINARY, SUM(region_size_in_bytes))
    AS Size, region_allocation_base_address AS Base
    FROM sys.dm_os_virtual_address_dump 
    WHERE region_allocation_base_address <> 0x0
    GROUP BY region_allocation_base_address 
UNION  
    SELECT CONVERT(VARBINARY, region_size_in_bytes), region_allocation_base_address
    FROM sys.dm_os_virtual_address_dump
    WHERE region_allocation_base_address  = 0x0
)
AS VaDump
GROUP BY Size)  
SELECT SUM(CONVERT(BIGINT,Size)*Free)/1024 AS [Total avail Mem, KB] ,CAST(MAX(Size) AS BIGINT)/1024 AS [Max free size, KB] 
FROM VASummary 
WHERE Free <> 0

3. Now, below query will identify the memory reserved by non-BPool components in SQL Server

select SUM(virtual_memory_reserved_kb)/1024 as virtual_memory_reserved_mb from sys.dm_os_memory_clerks where type not like '%bufferpool%'

4. To identify if any of the space used is by SQL Server procedure cache itself, you can use this query:

SELECT SUM(PAGESUSED)*8/1024 'MB of MemToLeave memory consumed by procedures' FROM MASTER.DBO.SYSCACHEOBJECTS WHERE PAGESUSED >1

DBCC MEMORYSTATUS also has good information on whether the usage from the non-BPool region is coming from SQL or non-SQL components. If its SQL 2000, check the value of “OS Committed” and it is SQL 2005/2008 look at the value of “MultiPage Allocator” for each memory clerk. Just to re-state my assumption in case you are not sure:
1 SQL Server Page = 8 KB –> SinglePage Allocator
> 1 Page or > 8KB –> MultiPage Allocator Apart from these you need to pay special attention to the following components are all of the below do not use memory from the Buffer Pool region but make direct VirtualAlloc() calls to reserve memory and then commit them:

1) Any OLE/COM components loaded in SQL Server
2) Extended Stored Procedures (use sys.dm_os_loaded_modules to identify the module loaded in sqlserver process space).
3) SQL Mail components
4) Any prepared documents using sp_xml_preparedocument
5) Linked Server Providers
6) Large Plans stored in Procedure Cache
7) Very frequent Backups also may cause MTL area depletion. (Please investigate using the parameters MAXTRANSFERSIZE and BUFFERCOUNT, if this is the case).
8) SQL CLR (recommended to be used on 64-bit SQL Servers)

Above list is certainly not exhaustive, but is more enough to get started in looking at the right areas. While we are on this topic, it is important to understand the difference between a Reserve and a Commit. These are windows concepts are remain the same for SQL Server as well, after all its VirtualAlloc() underneath the covers.

MEM_COMMIT – Region in VAS this is backed by RAM/paging file
MEM_RESERVE – Region in VAS with no actual physical storage either in RAM or in the paging file. The problems we talked about so far occur when a call to VirtualAlloc() with MEM_RESERVE is made and that “reservation” fails resulting in the errors printed to errorlog. Most times, the call to reserve is subsequently followed by a COMMIT, but its not mandatory. I can reserve now and commit later on using the base address of the reservation. SQL Server is smart enough and during starting reserves a certain region of the address space referred to as MTL or Non-BPool region. It is here the crunch is and it is here the issue needs to be investigated/fixed.

A quick workaround for above issues is to add the startup parameter –gXXX. (Sample usage- -g512)
XXX- is the amount in MBytes to reserve on startup.I would advise against doing this as this is a workaround where you are increasing the MTL region rather than find out who/what is consuming it. Slava’s blog is a good read also on this topic.

How to remove unwanted Quotation Marks while importing a data file

Posted on February 10, 2010 by Amit Banerjee

I have had multiple questions on how users wanted to remove the quotation marks while importing the data into a SQL Server database table while using BCP or BULK INSERT. Last month I worked on a similar issue where the need was to strip out the quotation marks for all the data columns present in the .DAT file. Read my post on SQLServerFAQ to find out how this can be accomplished using BCP, BULK INSERT or if you like using UI, the Export/Import Wizard.

How to identify the SQL Server version from a Profiler Trace

Posted on February 5, 2010 by Amit Banerjee

You use certain little tips and tricks with the SQL Server tools which are not documented anywhere. This is one of them. I needed to find out the version of a SQL Server instance from which the Profiler Trace was captured. I opened the Profiler trace using SQL Profiler 2008. I clicked on the Trace properties and lo behold I had the SQL Server instance name and version.

I got the SQL Server release, version and the instance name (Trace provider name).

CHECKDB with Repair Allow Data Loss: Should I shoot myself?

Posted on February 4, 2010 by Amit Banerjee

Over the past years, I have seen multiple DBAs execute DBCC CHECKDB with REPAIR ALLOW DATA LOSS to repair the SQL Server database corruption in their database without understanding repercussions. Even when you are working with CSS on such issues, CSS engineers actually explain what the command does and also send across a disclaimer stating why this command should NOT be executed on a database and what the repercussions are. Alas due to a lack of a sound database backup strategy, a lot of DBAs are forced to resort to losing data. In a nutshell, Repair Allow Data Loss tries to first repair the corruption in the page but if it cannot, then it checks if it can drop the page and repair the error and hence the name Allow Data Loss! I had a question for all the DBAs who advocate the use of this command:

How would you like your Payroll DBA to run the command? (Good enough food for thought!!)

The only way to compare how much data you lost would be to perform a CHECKDB before running the CHECKDB with REPAIR ALLOW DATA LOSS and after the CHECKDB has completed. The row counts returned by the two CHECKDB outputs would have to be compared. If CHECKDB is failing due to system catalog corruption, then you are plain out-of-luck.

What most people fail to understand that CHECKDB can verify integrity of the database based on the SQL database rules imposed on it like referential integrity, data length, record structures, page structures, header correctness etc. But these are database integrity rules that we are aware of. What about the business logic rules that you imposed while designing the database.

Eg: Let’s say you had a database table called Employee which depended on a Department ID valued to be present in the Department table. No foreign key relationship exists between the two tables. You experienced corruption in the database and used Repair Allow Data Loss command along with CHECKDB which dropped a few pages from the Department table. However, CHECKDB cannot know that a Department ID is missing in the Department table because you didn’t create a Foreign Key between the two tables. So when your application now tries to retrieve the Department ID of the Employee, it will BARF on you! This is the kind of scenario we specifically want to avoid which is why we advise against the use of data loss commands.

SQL Server wouldn’t know what data is supposed to present in the table rows. The only thing that SQL can verify while you are retrieving data or checking integrity of the database is that the column data and column datatype match. I have heard questions like “Why can’t SQL Server figure out if the value is correct?”. Well, would you know why a person’s name is Uncle Sam and not Uncle Bill? 🙂 So, how would SQL Server know that the FirstName column (an example) was supposed to store Sam and not Bam. As per SQL Server, both strings match the column datatype, hence from a structural and logical integrity, the data is correct.