Tools Tips and Tricks #11: Debug Diag and Memory leaks

This week I had shown how to use Debug Diagnostic tool to capture a dump for a first chance exception encountered by an application and perform analysis using the Crash Analysis rule. Today I am going to show to use Debug Diagnostic tool to track outstanding memory allocations for a process.

Steps

image1. Launch the Debug Diagnostic tool and click on the Rules tab. Click on Add Rule.

2. Select the Native (non-.NET) Memory and Handle Leak rule. (see screenshot on the right)

3. You cannot setup a memory leak tracking rule for a process that is not running as the Leak Tracking dll has to hook onto the imageprocess. In this example, I will be using the tool to track an executable called MemAllocApp.exe. Select the required process using the Select Target window. (see screenshot on the left)

4. In the next window titled “Configure Leak Rule”, you can use that to go granular with your tracking requirements. I have opted not to generate a dump after n minutes of tracking (Option: Generate final userdump after x minutes of tracking). I have selected an auto-unload of the Leak Tracking DLL once the rule is completed or deactivated (Option: Auto-unload Leak Track when rule is completed or deactivated). (see screenshot below)

5. Click on the Configure button and you can then configured additional options for the userdump generation for the process being tracked. I also have the tool set to automatically capture a user dump if the process that I am tracking unexpectedly shuts down. (Configure userdumps for Leak Rule window below in screenshot). I have configured the rule to capture a dump automatically if the process unexpectedly shuts down. (Option: Auto-create a crash rule to get userdump on unexpected process exit). Additionally, I have configured the rule to capture a userdump once the private bytes for the process reaches 350MB. (Option: Generate a userdump when private bytes reach x MB). As you can see in the screenshot below, there are additional options that you can configure but I don’t need them for this particular demo. image

6. Next you get the “Select Dump Location and Rule Name” window where you can changed the rule name and the location of the dumps generation. By default the dumps are generated at <Debug Diagnostic Install Path>\Logs\<Rule Name> folder.

7. Click on Activate Rule in the next window to start the tracking.image

Note: If you are not in the same session as the Debug Diag Service, then you will get the following message when you get the following pop-up, once you have configured the rule. Click on Yes. And then you should get a pop-up stating that Debug Diag is monitoring the EXE for leaks.

Process MemAllocApp.exe(15316) is in the same logon session as DebugDiag (session 2), but it is not in the same logon session as the DbgSvc service (session 0).  Do you want to return to ‘Offline Mode’ and continue?

On the Rules tab, you should see two rules. One for the leak tracking and the other for the crash rule. Once I hit the threshold of 350MB of privates bytes, I will a dump generated and the Userdump Count column value should change to 1. I was monitoring my application’s Private Bytes perfmon counter and the graph showed a steady increase. (see screenshot below). Now that the rule is active, I can find that the Being Debugged column has the value “Yes” and the LeakTrack Status column value will be Tracking for MemAllocApp.exe under the Processes tabs.image I then used the Analyze Data button under the Rules tab to generate the memory tracking report of a memory dump that I had captured earlier which I analyzed and these are a few excerpts from the report. image

The Analysis Summary tells me that I have outstanding memory allocations of 205MB. This dump was generated using a rule to capture a userdump when Private Bytes for the process exceeded 200MB. Next I shall look at the Virtual Memory Analysis Summary sub-heading…

image

This clearly tells me that the memory allocations are coming from the Native Heaps. And I know from the previous screen-shot that Heap Allocation functions (HeapAlloc) is being called. Now digging into the Outstanding Allocation Summary, I find that over 200MB of allocations have been done from my application and all allocations have been done on the heap. In the Heap Analysis summary, I find that the allocations have all come in from the default process heap. Drilling down into the MemAllocApp hyperlink, I get the offset making these allocations which is MemAllocApp+2cbb. image

The function details from the report is available in the quoted text below. If I have the debug symbols of the application (which I do), I find that this corresponds to my function call fn_allocatememory which makes 5MB allocations using HeapAlloc on the default process heap. If you align your symbols correctly for the analysis, you will find that the report also gives you the correct function names.

Function details

Function
MemAllocApp+2cbb

Allocation type
Heap allocation(s)

Heap handle
0x00000000`00000000

Allocation Count
41 allocation(s)

Allocation Size
205.00 MBytes

Leak Probability
52%

So without any debugging commands, I was able to drill down to the culprit making the maximum number of allocations. This can be quite a useful way of tracking down non-BPool (commonly known as MemToLeave on 32-bit SQL instances) allocations when the Mutli Page Allocations don’t show a high count but you are experiencing non-BPool memory pressure.

The fn_allocationmemory function code is mentioned below:

void fn_allocatememory(int cntr)
{
printf("Sleeping for 10 seconds\n");
Sleep(10000);
BYTE* pByte=(BYTE*) HeapAlloc(GetProcessHeap(), 0, 5242880);
(*pByte)=10;
printf("Iteration %d: Allocating 5MB using HeapAlloc\n",cntr+1);
}

I used the same HeapAlloc function that Sudarshan had used in his latest blog post to explain behavior changes in Windows Server 2008 for tracking heap corruptions.

Advertisement

Common SQL Server myths – Series II – SQL Memory

Not so long ago, I had written a blog post for SQL Server myths related to Setup & Service startup myths for T-SQL Tuesday which was hosted by Sankar Reddy (Blog | Twitter). The round-up of the T-SQL Tuesday posts had some awesome content where renowned SQL Community members debunked common SQL Misconceptions. Read the round-up here.

I found that that this was quite a compelling topic for a series and decided to do a Part II for SQL Myths around memory related misconceptions. SQL Memory by itself is a murky topic and there are quite a few urban legends that are so well imbibed in people’s memories that we have to battle against them to get users to see light at the end of the tunnel. So, here goes another round of debunking some more misconceptions.

Thumbs downAWE needs to be enabled on 64-bit instances for SQL Server
I think everyone who has posted an answer for this question on MSDN forums has probably gone tired of saying the same thing over and over again. This configuration option is required for 32-bit instances of SQL Server only to enable the instance to benefit from the Address Windowing Extension. AWE option on 64-bit instances has no meaning. This is not required and basically does nothing. Please refer the following blog post more details:
http://blogs.msdn.com/b/slavao/archive/2006/03/13/550594.aspx


Thumbs downSystem.OutOfMemoryException encountered while executing a query from SSMS is not a Database Engine Out-of-Memory condition
I have seen multiple suggestions on forums for reducing Max Server Memory. This error is a .NET exception telling the user that an OOM condition occurred. The database engine is not a managed application. It was coded in native language and hence, wouldn’t throw a .NET OOM error. I have written about this in an earlier blog post: System.OutOfMemoryException for Management Studio


Thumbs downWindows Task Manager will tell me SQL Server’s memory usage
Another urban legend that CSS deals with day in and day out. The “Monitoring Memory Usage” topic in SQL Server Books Online talks about how to monitor SQL Server memory usage. None of the tools you typically use to inspect application memory use (Task Manager, Perfmon/Sysmon, etc.) show the amount of AWE memory used by individual processes. There’s no indication of the amount of AWE memory used by each process, nor is this memory included in the working set size reported for a given process. Reference: http://blogs.msdn.com/b/psssql/archive/2009/09/11/fun-with-locked-pages-awe-task-manager-and-the-working-set.aspx


Thumbs downMax Server Memory will limit the SQL Server memory
This is true for only the SQL Server Buffer Pool. Max Server Memory setting is used to control the size of the Buffer Pool only. For a 64-bit instance, the non-BPool region can grow to a maximum of Total Physical Memory – (Max Server Memory + Memory used by Windows & other applications). I had blogged about this in the past in my post: Non Bpool region on 64bit SQL instances may be larger than you think


Thumbs downSQL Server eats up all the available memory on the box
SQL Server is a server application. Ideally, you should have a dedicated server for a SQL instance that has a high number of database transactions. If that is not the case, then the Max Server Memory needs to be capped after taking into account the memory requirements for Windows, SQL Server and the other applications on the box. SQL Server (especially 64-bit) is designed to not back off it’s memory footprint once it grows unless it is asked to because of a low memory notification received from the Operating System. The thumb rule of leaving just 2GB for the OS and other applications may not be enough when you have File Copies, SSRS, SSIS and SQL Database Engine running on the same instance. SQL Server 2005 and above is designed to react to low memory notifications sent out by Windows and scale back it’s memory usage as required.
Reference: http://msdn.microsoft.com/en-us/library/ms178145.aspx


Thumbs downVirtual Address Space – Isn’t that an excuse?
Virtual Address Space defines the amount of memory that can be addressed by any application and the limiting factor is always the processor architecture: 32-bit or 64-bit. 32-bit Windows can reference upto 4GB (232) and 64-bit can reference a maximum of 8TB (264) depending on the Windows edition. The addressing limitations on a 64-bit instance is limited to the availability of your Physical RAM which is an exact opposite of the story of a 32-bit server. With Physical Address Extension enabled on the Windows servers, Windows can use more than 4GB of memory and SQL Server (or any other Windows application if coded correctly) can take advantage of this by using AWE APIs. If you are falling prey to VAS limitations, then you either have a large VAS consumer or your workload is not the most optimal workload for 32-bit architecture! If you are facing a Virtual Address crunch on your SQL instance, then follow the steps mentioned in the post to work towards determining the root cause of the problem: Who is eating up the Virtual Address space on your SQL Server
Reference: http://technet.microsoft.com/en-us/library/ms187499(SQL.90).aspx


Thumbs downI can change –g to whatever I want!
Absolutely not! The hard coded upper limit for –g is 1024GB. Whenever you increase your MemToLeave on a 32-bit machine, you can decreasing the SQL Serve Buffer Pool memory proportionately. Enabling this for a 64-bit instance is bit of a moot-point. So increasing –g is just a workaround and sometimes it might lead you to face BPool pressure on the server. I had documented the layout of the buffer pool for a 32-bit instance in an earlier blog post: SQL Server Memory Basics
Reference: http://msdn.microsoft.com/en-us/library/ms190737(v=SQL.90).aspx


Thumbs downI am Smart! I will enable /3GB and then increase –g. All problems solved!
Well, I would say you are too smart for your own good. For systems with above 16GB of RAM, this can prove to be counter-productive. This will only cause a lack of available System PTEs on the Windows server and create more problems than it actually solves. From “Using AWE” topic in SQL Server Books Online:

If there is more than 16 GB of physical memory available on a computer, the operating system requires 2 GB of virtual address space for system purposes and therefore can support only a 2 GB user mode virtual address space. For the operating system to use the memory range above 16 GB, be sure that the /3gb parameter is not in the Boot.ini file. If it is, the operating system cannot use any physical memory above 16 GB.


Thumbs downLock Pages in Memory is the next best thing after sliced bread!
How many times have we heard this: “Enable LPIM for the SQL service account and everything will be alright”! Everything will be alright only when you have taken into consideration the memory requirements for the SQL Server instance(s) on the box, other applications and Windows and then capped the Max Server Memory. Without that enabling LPIM for the SQL Service account is as good riding a bike on the freeway without a helmet. It’s all smooth sailing till disaster strikes!

 
Hope the above points help clear out some urban legends regarding SQL Server Memory concepts and next time during a coffee break or a water cooler discussion, the above points help you convince a *misguided SQL Server expert*!! Winking smile

Non Bpool region on 64bit SQL instances may be larger than you think

One of my older posts talked about SQL Server Memory Basics. The infamous MemToLeave region was shown with a default value of 384MB. This post was penned a long time back and the SQL landscape has changed since then. Since the post was mainly based on showing the memory architecture for SQL Servers in a 32-bit environment, the calculations shown were based on the 2GB user address space limitation for x86 architecture.

With the adoption of 64-bit architecture, the 2GB user mode address space (3GB if you use the /3GB switch) went out of the window. One of the lesser known facts that I have seen multiple times while addressing memory related concerns for non-BPool memory consumption on MSDN is that the non-BPool region is now not just 384MB.

Before I go onto explain this, I would like to point out that the Max Server Memory setting for SQL Server (irrespective of architecture) is concerned with the BPool region. This doesn’t limit the amount of memory that can be used by components which request for contiguous allocations above 8KB in size. The non-BPool region is now limited not by your virtual address space in 64-bit environments but by your physical memory (RAM) on the server. Let me explain this with an example.

If you have a SQL Server instance on a 64-bit box with Max Server Memory set to 24GB and the total physical memory on the server is 32GB, then your SQL instance can now theoretically use the remaining 8GB for contiguous memory allocations requested above 8KB in size. This is where I keep advocating the understanding of the environment before setting Max Server Memory for the SQL instance. Leaving 2 or 3GB for your Operating System and the non-BPool memory is just not going to cut it if you have components active within SQL Server which require huge amounts of contiguous memory. I have already listed out the non-BPool memory consumers in a previous post which outlines the most common consumers. The reason this happens is that on 64-bit environments you are no longer constrained by the 2 or 3GB user address space limitation because your address space runs into a few TBs. So, the limiting factor here is your physical memory (RAM). Once you have set Max Server Memory to finite value, the SQL instance is told that the Buffer Pool cannot grow beyond this value but this doesn’t limit non-BPool allocations. As long as you have available physical memory on the server, SQL can continue using that for contiguous allocation requests above 8KB in size. This makes it highly important for you to evaluate which components in your SQL environment can make such allocation request and set the Max Server Memory in such a manner that you do not starve either of the two memory regions in the SQL Server or any of the other applications running on the server or the Operating System. Unless you do this, you could end up troubleshooting SQL Memory Pressure very frequently on the box especially issues like paging or general server unresponsiveness/sluggishness before you hit the nail on the head!

To summarize, two points that you need to keep in mind for 64-bit SQL environments:

1. Max Server Memory setting only limits Buffer Pool

2. Non-BPool memory region for SQL is effectively now: Total Physical Memory – (Max Server Memory + Physical Memory used by OS and other applications)

How to find who is using/eating up the Virtual Address space on your SQL Server

Well, this is often a tricky situation, where you are running into Virtual Address Fragmentation and getting OOM (out-of-memory) errors in your SQL Server. Most often, people do not understand the difference between physical memory pressure vs. virtual memory pressure. Adding more RAM is definitely not a solution here!         Here are some sample error messages you might have seen in your SQL Errorlog which will indicate if this is physical memory or virtual memory issue,

 SQL 2000
WARNING: Failed to reserve contiguous memory of Size= 65536.
SQL Server could not spawn process_loginread thread.      

SQL 2005 and above
Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE 122880            

 Error: 701, Severity: 17, State: 123.
There is insufficient system memory to run this query.     

As the errors above indicate, the problem is in reserving a fixed size in the virtual address space of SQL Server. Note: the size indicated above in in bytes. e.g. 65536/1024 = 64 KB

Typical symptoms you would notice in these situations are :-

1) Database or Log Backups might start failing
2) You are unable to make a new connections to SQL.
3) Certain jobs which require memory from non-BPool region will fail.  Many a time, these problems go away automatically without any action taken. This indicates an intermittent problem where at a certain point in time, there was virtual memory pressure which resulted in above messages being printed to the SQL Errorlog.  Dealing with VAS fragmentation or running out of VAS on SQL server 2000 was rather painful and required setting up additional debugging techniques to get down to the bottom of the issue. You might have used some of the following tools :-

1) VMSTAT
2) Debugging Tools for Windows (to capture a manual dump of sqlservr.exe)
3) T2551 to generate a filtered dump in SQL Server when running into a OOM condition.
4) TLIST.exe to identify modules loaded in SQL Server.  Luckily, starting with SQL 2005 there is a in-memory DMV which tracks the virtual address space (VAS) of your SQL Server process.

Here are some queries which will help you find out how much virtual address is available on sqlservr.exe which is FREE and how much is total available (Free+InUse)
 

1. Will tell you the size of the biggest contiguous block in VAS

 

SELECT convert(varchar,getdate(),120) as [Timestamp], max(region_size_in_bytes)/1024 [Total max contiguous block size in KB]  from sys.dm_os_virtual_address_dump where region_state = 0x00010000 --- MEM_FREE 
2. Will also tell us size of largest contiguous block plus the region marked as MEM_RESERVE (this is your non-BPool area reserved during SQL Startup, sometimes referred to as MTL – MemToLeave)

 

 

With VASummary(Size,Reserved,Free) AS
(SELECT
    Size = VaDump.Size,
    Reserved =  SUM(CASE(CONVERT(INT, VaDump.Base)^0)
    WHEN 0 THEN 0 ELSE 1 END),
    Free = SUM(CASE(CONVERT(INT, VaDump.Base)^0)
    WHEN 0 THEN 1 ELSE 0 END)
FROM
(
    SELECT  CONVERT(VARBINARY, SUM(region_size_in_bytes))
    AS Size, region_allocation_base_address AS Base
    FROM sys.dm_os_virtual_address_dump 
    WHERE region_allocation_base_address <> 0x0
    GROUP BY region_allocation_base_address 
UNION  
    SELECT CONVERT(VARBINARY, region_size_in_bytes), region_allocation_base_address
    FROM sys.dm_os_virtual_address_dump
    WHERE region_allocation_base_address  = 0x0
)
AS VaDump
GROUP BY Size)  
SELECT SUM(CONVERT(BIGINT,Size)*Free)/1024 AS [Total avail Mem, KB] ,CAST(MAX(Size) AS BIGINT)/1024 AS [Max free size, KB] 
FROM VASummary 
WHERE Free <> 0


 3. Now, below query will identify the memory reserved by non-BPool components in SQL Server

select SUM(virtual_memory_reserved_kb)/1024 as virtual_memory_reserved_mb from sys.dm_os_memory_clerks where type not like '%bufferpool%' 
4. To identify if any of the space used is by SQL Server procedure cache itself, you can use this query:

 

 

SELECT SUM(PAGESUSED)*8/1024 'MB of MemToLeave memory consumed by procedures' FROM MASTER.DBO.SYSCACHEOBJECTS WHERE PAGESUSED >1

DBCC MEMORYSTATUS also has good information on whether the usage from the non-BPool region is coming from SQL or non-SQL components. If its SQL 2000, check the value of “OS Committed” and it is SQL 2005/2008 look at the value of “MultiPage Allocator” for each memory clerk. Just to re-state my assumption in case you are not sure:
1 SQL Server Page = 8 KB –> SinglePage Allocator
> 1 Page or > 8KB –> MultiPage Allocator    Apart from these you need to pay special attention to the following components are all of the below do not use memory from the Buffer Pool region but make direct VirtualAlloc() calls to reserve memory and then commit them:

 1) Any OLE/COM components loaded in SQL Server
2) Extended Stored Procedures (use sys.dm_os_loaded_modules to identify the module loaded in sqlserver process space).
3) SQL Mail components
4) Any prepared documents using sp_xml_preparedocument
5) Linked Server Providers
6) Large Plans stored in Procedure Cache
7) Very frequent Backups also may cause MTL area depletion. (Please investigate using the parameters MAXTRANSFERSIZE and BUFFERCOUNT, if this is the case).
8) SQL CLR (recommended to be used on 64-bit SQL Servers)

Above list is certainly not exhaustive, but is more enough to get started in looking at the right areas. While we are on this topic, it is important to understand the difference between a Reserve and a Commit. These are windows concepts are remain the same for SQL Server as well, after all its VirtualAlloc() underneath the covers.

MEM_COMMIT – Region in VAS this is backed by RAM/paging file
MEM_RESERVE – Region in VAS with no actual physical storage either in RAM or in the paging file.  The problems we talked about so far occur when a call to VirtualAlloc() with MEM_RESERVE is made and that “reservation” fails resulting in the errors printed to errorlog. Most times, the call to reserve is subsequently followed by a COMMIT, but its not mandatory. I can reserve now and commit later on using the base address of the reservation. SQL Server is smart enough and during starting reserves a certain region of the address space referred to as MTL or Non-BPool region. It is here the crunch is and it is here the issue needs to be investigated/fixed.

A quick workaround for above issues is to add the startup parameter –gXXX. (Sample usage- -g512)
XXX- is the amount in MBytes to reserve on startup.I would advise against doing this as this is a workaround where you are increasing the MTL region rather than find out who/what is consuming it. Slava’s blog is a good read also on this topic.

 

MTL Consumers

A previous post of mine talked about SQL Server memory architecture in a nutshell. Let’s find out what are the common consumers of the much talked about MemToLeave region and what you should be looking for in-case you run into MemToLeave issues.

Major Consumers of MTL

    1. Use of Linked Servers – You can find out the linked servers that being used in your environment using the sysservers system catalog.
    2. Use of XML documents – You would have to find out if any queries or procedures perform any kind of XML data manipulation or use sp_xml_preparedocument.
    3. Extended Stored Procedures or sp_OAcreate calls
      1. Extended stored procedure usage can be identified by inspecting the SQL Server Errorlog and searching for the word using. The first XSP calls is logged in the Errorlog in the following manner: Using ‘<dll name>’ version ‘<version>’ to execute extended stored procedure ‘<XSP name>’.
      2. If you are using sp_OAcreate, then this information would be logged in the SQL Errorlog for the first invocation of sp_OAcreate using the same pattern mentioned above. The only difference would be that the DLL name would be odsole70.dll.
    4. Query Plans which are larger than 8 KB in size
      1. If you are using SQL Server 2000, query syscacheobjects and search for entries which have values greater than 1 for the pagesused column.
      2. If you are using SQL Server 2005, search for plans greater than 8KB using the DMV sys.dm_exec_cached_plans. Inspect the size_in_bytes column value and search for entries which have values greater than 8192 bytes.
    5. Use of SQLCLR (Applicable to SQL Server 2005 and above) – Check the memory usage for SQLCLR clerk using DBCC MEMORYSTATUS.
    6. Backups using larger MAXTRANSFERSIZE parameter – In such a case find out BACKUP DATABASE/LOG commands and inspect the MAXTRANSFERSIZE parameter value. The lowest size that can be provided is 65536 bytes.
    7. Using Network Packet Size higher than 8192 bytes. You can find all connections on SQL Server 2005 that use more than 8KB of connection using the following DMV: select * from sys.dm_exec_connections:

select * from sys.dm_exec_connections
where net_packet_size > 8192