Change Tracking Cleanup

Published two posts on the Tiger team blog for Change Tracking cleanup:

Part 1 – Basics about Change Tracking Cleanup

Part 2 – Deep dive on Change Tracking Cleanup

SQL Saturday 511: Extended Events and more

imageExtended Events have evolved a lot since they were first introduced in SQL Server 2008. And today, there are multiple extended events available in SQL Server which allow you to debug common scenarios without collecting having to collect memory dumps or diagnostic data which can be detrimental to your SQL Server instance’s performance.

On April 2nd, join me at Redmond to understand how to leverage the new extended events that are available in SQL Server 2012 Service Pack 3 and above in your environments. I will talk about common scenarios where the new extended events are available and provide canned scripts to help collect and analyze data for complex scenarios.

There are a host of SQL experts and as well as folks from the Microsoft SQL Server product group who will be at the event to present and answer questions that you have! Looking forward to meet the SQL Community at the event!

WHEN: April 2nd, 2016, 2.15 – 3.15 PM PST
SESSION: Troubleshooting made easier using Extended events

Details about the session are available here. The full schedule of the event is available here.

The slide deck that I used during the session is available below.

24HOP Session: How to be a Ninja – Troubleshooting SQL PERF on Azure VMs

I have been a bit quiet on this blog but that is partly due to the fact that I have moved into a new role and a new country. I am now a part of the SQL Server Product Group [b|t] and based out of Redmond.

I am actually really excited about this. Sourabh Agarwal [b|t] and I are going to be presenting a preview to our SQL PASS Summit 2015 pre-con session "How to be a Ninja: Troubleshooting SQL performance on Azure Virtual Machines". Yes, we are starting the initiation program of becoming a SQL performance troubleshooting NINJA on SQL Server!

Troubleshooting is an art but the tricks of the trade changed with the advent of Azure Virtual Machines. Performance troubleshooting is different and at the same time very similar to what you have been used to for SQL Server. SQL Server performance on Azure VMs can be a sore point for many as the host troubleshooting entry points are limited and the knowledge of the internal workings scarce.

In this session, we will show you what best practices should be known for SQL Server instances running on Azure Virtual Machines! We will talk about tips on automating the implementation of all these best practices during deployment making this a single one-click deployment. This session will be a pre-cursor to our pre-con where we will go the whole nine yards and detail how to automate deployments from scratch, implement best practices automatically and analyze performance issues magically!

We hope you can join us for this session online and we do hope to see you during our pre-con! The 24Hop sessions are full of great sessions from great speakers in the SQL Family. See the full list here. I would recommend looking through the list and signing up for the ones that you are interested in. This will also give you a preview of what you can expect in the SQL PASS Summit this year.

This 24 Hours of PASS: Summit 2015 Preview event takes place over 24 hours, beginning September 17, 2015, 12:00 UTC. Featuring 24 webcasts delivered over 24 hours, this event provides a glimpse into the unparalleled content on offer at PASS Summit 2015, October 27-30, in Seattle, WA.

WHEN: September 17th at 8AM PST (3PM GMT)
WHERE: ONLINE
Facebook Event for our session: https://www.facebook.com/events/938656286172663/
Registration link for the event: http://www.sqlpass.org/24hours/2015/summitpreview/Registration.aspx

[UPDATE] September 29th, 2015

Thank you for the feedback that you shared after the session. It is always great to know what people liked in the session and even better to know where we need to improve. This helps ensure that our next iteration has the necessary tweaks. We received an overall 90% positive feedback and we thank everyone who attended for that!

The replies to the questions from the session are available below.

Q. Regarding the performance fixes as best practices(hotfixes/CU), do we have separate hotfixes(.msi/msp) for azure environment when compared to on premises environment?
A. The SQL Server installation bits that you would run on Azure VMs and on virtualized/physical on-premises environment are the same. So there aren’t any different set of fixes that need to run on Azure VMs.

Q. Are these Cheat Sheets available online?
A. The cheat sheets are available in the presentation PDF on the 24HOP site.

Q. Is using "Lock Pages in Memory" lead to that total allocated memory amount of SQL Server process is not seen in Windows Task Manager?
A. Task manager is not a good place to look for allocated memory when you want to find out allocations made after enabling Lock Pages in Memory privilege for the SQL Server service account. You could either look at Total Server Memory perfmon counter or the memory DMVs to track SQL Server memory usage. Additional reference: https://msdn.microsoft.com/en-us/library/ms176018.aspx 

Q. Why are you disabling caching on the log file drive?
A. This is due to the IO patterns that the SQL Server transaction log file receives and how Azure storage is structured. We have seen in tests that the performance for SQL Server transaction log is best when write caching is disabled for disks which hosts transaction log files. We will talk about this in detail during our pre-con session.

Q. For Datawarehousing workloads, do you recommend lock pages in memory setting on on-premise/azure VM hosting SQL Server?
A. For on-premise workloads, we recommend you test and ascertain the needs before enabling Lock Pages in Memory (LPIM) privilege. For Azure VM workloads, the first important task is to pick the machine with the right SKU. We recommend enabling LPIM to prevent paging to the local disk on the rack which can negatively affect performance.

Q. Why are there different storage options based on Windows version? Is there any dependency on SQL versions?
A. There aren’t different storage options based on Windows version. The different storage options are based on the performance tier that you want to be on. It is Windows and SQL version and release agnostic.

Q. Can you let me know the resources on Azure Storage?
A. The Azure storage documentation is a good place to start for this. We will talk about this in detail in the IaaS introduction part of our pre-con.

If we have missed any question, please leave your question in the comment section of this post and we will answer it.

Lastly, we loved the notes that Matt Penny [t] took during our session. A screenshot of that is shown below. Thank you Matt! J The 24HOP session presentation is attached on the session page.

Notes

SQL Server 2012: Debugging Deadlocked Schedulers

I had blogged about debugging deadlocked schedulers earlier for SQL Server 2008 R2 and below releases. Since there have been some fundamental changes in SQL Server 2012, I thought it would be a good idea to post about debugging the same scenario on SQL Server 2012 instances.

Older posts:
Debugging Deadlocked Schedulers Part 1
Debugging Deadlocked Schedulers Part 2

Continue reading

CScript and RunAsAdmin

I had written a script a while back which would set the TCP/IP port for a SQL Server instance. Before you start throwing brick bats at me…. Powershell was not in existence in those days and yes…. doing the same tasks in Powershell is much easier! Phew… Now let me get back to my story!

One of my colleagues told me that the script was failing due with the following error message:

SQL_PortChange.vbs(52, 1) Microsoft VBScript runtime error
: Object required: ‘objOutParams

Now the above error is not the most intuitive of error messages considering the fact the object is being assigned a value in my VBscript. With a little bit of troubleshooting, we figured out that the RunAs Administrator (it can really catch you off-guard) option was not used to launch the command prompt.

So when running such VBscripts, do not forget to use RunAs Administrator option!

Now let us look under the hood a bit! I naturally was curious as to why the access denied message was not being thrown back to the user. I used Process Monitor to trace the registry activity of cscript.exe and wmiprvse.exe which actually works in the background to perform the tasks mentioned in the VBscript. I found that there were Access Denied messages in the Process Monitor trace but they were not being bubbled up to the user (see screenshot below)!

image

As you can see above, the access denied was on the SQL Server TCP/IP registry key and the WBEM keys. Since the registry key could not be read, the object was not created. And which is why we got the weird error listed above.

I thought this would be a good issue to blog on in case some one else ran into a similar issue!

Fluffy in an Availability Group Failover Scenario

Over the past month or so, I have been dealing with a lot of questions around the troubleshooting failover scenarios for Availability Groups. So I decided that it is now time for me to pen down a post on the data to be collected and analysis options for digging into the root cause for an Availability Group. I did have time on my hands and decided to induce a Hollywood element into this post as well. The availability group name that I would be using in this post is named as Fluffy. Fluffy has two secondary Availability Replicas: one synchronous and the other one an asynchronous replica.

As you can see in the screenshot below, I had initiated a failover for my Availability Group and the AlwaysOn
Extended Events sessions shows a state change. The Extended Events session writes to a target file (.xel) which is present in the SQL Server LOG folder.

The Extended Event session runs by default when an Availability Group is configured on the SQL Server instance. The following extended events are captured by the Event Session:

  • sqlserver.alwayson_ddl_executed,
  • sqlserver.availability_group_lease_expired,
  • sqlserver.availability_replica_automatic_failover_validation,
  • sqlserver.availability_replica_manager_state_change,
  • sqlserver.availability_replica_state_change,
  • sqlserver.error_reported

Note that the Extended Events session will only track the state changes for the local replica. The Extended Events session is NOT a global store for all the state change events for all replicas!

The previous set of logs that you collect from the SQL Server failover cluster instances like the SQL Errorlog, Cluster log and Windows Event logs are still applicable for root cause analysis for failovers. However, now you have additional logs in the SQL Server LOG folder which can assist with a root cause analysis for failover issues. The screenshot below shows two new files that would be of interest when analyzing SQL Server failovers namely, the AlwaysOn_health_* and <server name>_<instance name>_SQLDIAG_* logs. The first set of files are the AlwaysOn Extended Events logs and the second set of logs are called the Failover Cluster Instance Diagnostics Log.

We already saw from the above screenshot what the AlwaysOn Extended Events health session can track. Now, let’s see what the Failover Cluster Instance Diagnostics Log collects. There will be multiple informational messages about the activities performed against the Availability Group. Additionally, there will be messages pertaining to the sp_server_diagnostics data (component_health_resultset) collection and the Availability Group state change (availability_group_state_change).

The T-SQL query below can help you fetch the state change information for your SQL Server instance. Again, this is specific to the instance from which you fetched the failover cluster instance diagnostics log:

select object_name,cast(event_data as xml) as xmldata
from sys.fn_xe_file_target_read_file('<file name/path>', null, null, null)
where object_name = 'availability_group_state_change'

A snippet of the XML data retrieved using the above query for the manual failover that I had done is shown below:

<data name=”target_state“>
<
value>2</value>
<text>Online</text>
</data>
<data name=”failure condition level“>
<value>3</value>
<text >SYSTEM_UNHEALTHY</text>
</data>


<data name=”availability_group_name”>
<value>FLUFFY</value>
</data>

</event>


In summary, the following sets of logs need to be collected from all the Availability Replicas:

  1. SQL Server Errorlog from the time of the failure
  2. Windows Application and System Event logs from the time of the failure
  3. All the Failover Cluster Instance Diagnostics log (upto a maximum of 10 rollover .xel files by default)
  4. All the AlwaysOn Extended Event session log files (upto a maximum of 4 rollover .xel files by default)
  5. System Health Session Extended Event session files (optional as the component health state information is present in #4)
  6. Windows Cluster log

There are some useful queries in the Books Online topic for the failover cluster instance diagnostics log to parsing through the collected data.

Happy troubleshooting!!

P.S. The above blog post was created using a lab environment provided by SQL Server Virtual Labs. This is an online environment which allows you to create virtual machines to practice various SQL Server scenarios. The lab that I used was “SQL Server 2012: AlwaysOn Availability Groups (SQL 142).

SQL Server is hung!

In the recent past, I had to work on a SQL Server 2000 instance which became unresponsive after a short period of time the service was restarted. Since this was SQL Server 2000, I didn’t have the opportunity to use a Dedicated Administrator Connection (DAC) to log into the SQL Server instance to see if a DAC connection succeeded. And if it did, could I figure out what was happening on the SQL Server engine that it was not accepting a new connection.

Post the SQL Server service restart, the ERRORLOG very happily indicated no issues and if you weren’t already ready to tear your hair out due to the lack of error messages, the connection failure reported the most generic of errors messages:

Server: Msg 11, Level 16, State 1
[Microsoft][ODBC SQL Server Driver][TCP/IP Sockets]General network error. Check your network documentation.

I did the basic due diligence to check if the network protocols were enabled and if the port on which the SQL Server instance was supposed to listen on was actually open. I did happen to check the netstat output to check the activity on the port and found a large number of connections on the SQL Server port. I did a quick check of the count of the number of connections showing up to determine if this was a TCP port exhaustion issue. But that was not the case either! The Errorlog didn’t even report a Deadlocked Schedulers condition for me to know that there was an issue.

Continue reading