Performance issues due to USERSTORE_TOKENPERM

On versions of SQL Server 2005, a known issue can degrade query performance on 32-bit and 64-bit systems with 2 gigabytes (GB) or more of memory. When you execute queries under the context of a login that is not a member of the sysadmin fixed server role, you might encounter performance degradation symptoms that arise from a large and growing Security Token cache. These issues can include performance degradation of queries, high CPU usage for the SQL Server process, and a sharp increase in worker threads and SQL user connections. Systems that have less than 2 GB of memory should not experience these issues because normal memory usage requirements keep the Security Token cache from growing too large.

If you are facing the symptoms below, then it is a good time to discuss applying SP3 with CU1 with your management or apply the workarounds mentioned in the articles below. The most common workaround is having a job running at periodic intervals and cleaning up the TOKENPERM cache using DBCC FREESYSTEMCACHE (‘TokenAndPermUserStore’).

Symptoms

  1. A large number of users running queries on the SQL instance and intermittently you find that the duration of queries increases intermittently
  2. Most of the queries in the environment exhibit this pattern for random parameters and not any specific query/proc with any specific parameter
  3. DBCC MEMORYSTATUS output for the SQL instance shows that the memory usage for SQL is gradually increasing over a period of time
  4. High CPU during the problem period
  5. DBCC FREEPROCCACHE or DBCC FREESYSTEMCACHE fixes the issue temporarily

Related Articles

  • KB article 927396: Queries take a longer time to finish running when the size of the TokenAndPermUserStore cache grows in SQL Server 2005
  • KB article 933564: FIX: A gradual increase in memory consumption for the USERSTORE_TOKENPERM cache store occurs in SQL Server 2005
  • KB article 937745: FIX: You may receive error messages when you try to log in to an instance of SQL Server 2005 and SQL Server handles many concurrent connections
  • KB Article 959823: How to customize the quota for the TokenAndPermUserStore cache store in SQL Server 2005 Service Pack 3
  • http://blogs.msdn.com/psssql/archive/2008/06/16/query-performance-issues-associated-with-a-large-sized-security-cache.aspx
  • Deadlock Troubleshooting

    Deadlock troubleshooting has always been a bit tricky. A deadlock is a special type of blocking scenario, but blocking and deadlocking are not the same thing. Deadlocks are a natural side effect of blocking. The most typical deadlock solution is either a stored proc/app code optimization, or a change in the database schema or indexing strategy.

    To capture a deadlock graph in the SQL Server ERRORLOG, use the following trace flags:

    • For SQL Server 2000: 1204 & 3605
    • For SQL Server 2005: 1222 & 3605

    The trace flags can be enabled using DBCC TRACEON(<trace flag #>,-1)

    Refer Bart Duncan’s Weblog for deadlock troubleshooting. Bart was an Escalation Engineer for SQL Server and his blog article spanning three parts for deadlock troubleshooting is probably the most definitive guide for this topic.

    http://blogs.msdn.com/bartd/archive/2006/09/09/747119.aspx

    The -T1204 output can be interpreted using the following file: http://blogs.msdn.com/bartd/attachment/747119.ashx

    If you find that your SQL Server is encountering deadlocks, then what data do you need to collect:

    1. SQL Server Profiler traces (Refer my server side profiler traces post for setting up profiler traces) with the following events:

    • All statement level events
    • Showplan All events because the statement that is chosen as the deadlock victim will not have a Statistics Profile Event as it would have been terminated
    • Lock: Deadlock and if you are using SQL Server 2005, then capture Deadlock Graphs as well

    2. Perfmon Logs

    3. SQL 2000 Blocker Script output/SQL 2005 PerfStats

    Additionally, if as a developer if you are concerned about your statements being terminated due to deadlocks, then you must build a retry logic into your code for handling the deadlock and resubmitting the query. This can be done using a try-catch block within the query batch that you are executing.

    SQL Nexus

    This is a tool that is used by the Microsoft PSS team to analyze PSSDiag data. This tool has the capability of loading the Profiler Traces collected (provided ReadTrace is installed on the box).

    In addition to this, it also loads the other .OUT files and the PerfStats script output file into the Nexus database. I use this tool regularly to analyze PSSDIAG data due to the following reasons:
    1. Saves me time from manually opening each file and looking through them
    2. Based on the graphs provided in the reports, I can quickly drilldown to the major bottleneck

    The SQL Nexus reports act as a brilliant guide when you are analyzing PSSDIAG for the following scenarios:
    1. High CPU
    2. Blocking
    3. High IO
    4. General Performance Problems

    Since, the SQL Nexus Reports use Reporting Services client-side report viewer, you can create your own reports to work against the Nexus database and then deploy them to the %appdata%/SQL Nexus/Reports folder. Once, this is done, the new reports would show up along with the defult reports every time SQL Nexus is launched.

    What are the advantages of this tool?
    1. Shows aggregation reports for the Profiler Traces loaded into it
    2. Blocking and wait statistics information based on the PerfStats script output. This helps in analyzing blocking on the server
    3. Since, the data collected are loaded into tables, you can run queries against the database to draw inferences. If you use these queries frequently enough, you can create your own reports and deploy them to the above mentioned location
    4. You can export the data from the SQL Nexus Reports into files (.XLS,.PDF,.JPG etc) and send them as reports
    5. The reports let you narrow down to specific time frames which makes analysis easier
    Download location: http://www.codeplex.com/sqlnexus

    Addition: April 4, 2011

    A list of known issues documented with SQL Nexus:

    http://blogs.msdn.com/b/sqlserverfaq/archive/2011/02/07/sql-nexus-and-rml-utilities-a-primer-on-the-issues-that-i-have-dealt-with-last-year.aspx

    Addition: April 6, 2011

    A very common issue that I worked on multiple times after which I realized that it warranted a blog post:

    SQLDIAG: There are no instances of version 10 on this computer

    RML Utilities OR ReadTrace

    The Microsoft Download site has a tool called RML Utilities which can be used to process the profiler traces collected from a SQL Server instance.
    SQL Profiler Traces are very helpful when we need to perform a Bottleneck Analysis in any SQL environment or you need to find out the before & after picture. But the daunting task remains in analyzing the Profiler Traces because the size and volume of the trace files.

    SQL Server provides an in-built function fn_trace_gettable() to load the collected profiler traces into a SQL Server database table.
    Then you can run queries on this table to draw inferences from the profiler traces collected during that period. Lets say you wanted to find out all queries that had a duration above 10,000 and group the results by host name, then you could write a query in the following manner:

    SELECT hostname, count(*) as counts
    FROM tbl_trace
    WHERE Duration > 10000
    GROUP BY hostname

    This is a tool that is used by Microsoft PSS to analyze SQL Server Profiler Traces collected by the PSSDIAG utility.

    For more details, please refer:
    Replay Markup Language
    http://support.microsoft.com/?kbid=944837
    RML Utilities (x86) Download
    http://www.microsoft.com/downloads/details.aspx?familyid=7EDFA95A-A32F-440F-A3A8-5160C8DBE926&displaylang=en
    RML Utilities (x64) Download
    http://www.microsoft.com/downloads/details.aspx?familyid=B60CDFA3-732E-4347-9C06-2D1F1F84C342&displaylang=en

    My Transaction Log is Growing!!!

    The transaction log growth issue… What do we do?? A few things NOT TO do when this happens:
    1. Delete the LDF file of the database
    2. Create a new LDF file if there is no space on the disk drives and if the SQL Server ERRORLOG is reporting 900* messages. (Personal opinion.. Seen a lot of situations where this is has gone horribly wrong on production environments)
    3. Take the database offline
    4. Restart the SQL Server service

    Believe me the above options are exercised some times on production scenarios.

    Now that the above steps are avoided, what is to be done. Simple set of steps can be done to shrink the T-LOG of the database in question using the steps mentioned in the KB Article below. My troubleshooting steps involve the following:
    1. Check the current SQL Server ERRORLOG for any 900* error messages related to the T-LOG
    2. If you have enough disk space and the database is not in SIMPLE recovery, make a backup of the transaction log so that the you can most of the current active virtual log files
    3. Use DBCC SQLPERF (LOGSPACE) to find out the current used percentage of the transaction log for the database in question
    4. If it’s SQL Server 2005, find out the log reuse value in the sys.databases catalog view output. This will tell you why a database transaction log is not being reused.
    5. Find out if there are any open transactions using DBCC OPENTRAN command. KILL the transaction if you can ascertain that this open transaction can be redone or recovered later.
    6. Try shrinking the T-LOG using DBCC SHRINKFILE or if it is in SIMPLE recovery, try using BACKUP LOG WITH TRUNCATE_ONLY command. But be advised that this truncate command will truncate your transaction log. OR You can even try firing a CHECKPOINT into the database and then trying to truncate the transaction log. This works for only SQL Server 2000 instances.

    ** I know a lot of people frown on the shrink and truncate options but when you are strapped for time, this might be your ONLY option.

    If you are truncating the log, then you need to take a Full Backup and then continue your log backups (if FULL/BULK LOGGED recovery model is used) since truncation breaks your log chain. You should always try to find out the RCA for the issue (if possible and if you get the chance) because you don’t want the same (as Bruce Willis says in Die Hard Smile) thing happen to you twice!!
    Another important thing is to make sure that your autogrowth considerations are taken into account. The following article should help you decide what needs to done in deciding the autogrowth of a database file: http://support.microsoft.com/kb/315512/en-us

    For more information, see the following topics in SQL Server Books Online:
    • Shrinking the transaction log
    • DBCC SHRINKFILE (Transact-SQL)
    • Truncating the transaction log

    Useful articles
    How to use the DBCC SHRINKFILE statement to shrink the transaction log file in SQL Server 2005
    http://support.microsoft.com/kb/907511/en-us
    INF: Shrinking the Transaction Log in SQL Server 2000 with DBCC SHRINKFILE
    http://support.microsoft.com/kb/272318/en-us
    INF: How to Shrink the SQL Server 7.0 Transaction Log
    http://support.microsoft.com/kb/256650/en-us
    How to stop the transaction log of a SQL Server database from growing unexpectedly
    http://support.microsoft.com/kb/873235/en-us
    A transaction log grows unexpectedly or becomes full on a computer that is running SQL Server
    http://support.microsoft.com/kb/317375/en-us\

    Common Reasons for T-LOG growth
    1. Wrong autogrowth values (Mostly high percentage values)

    2. Index rebuild operations for database done infrequently with database in full recovery model or with log shipping/mirroring enabled

    3. Something failing in your replication topology

    4. BULK insert/update operations

    5. Long running transactions

    In case, the above mentioned steps and articles do not help you, please call Microsoft SQL Support.