SQL Saturday 572–Extended Events and Always On

imageThis is the second year that I will be speaking at SQL Saturday at Portland. Last year I had talked about troubleshooting slow queries. This year I will talk about debugging SQL Server issues without having to use the debugger! As always, this will be an awesome prelude to the PASS Summit!

The event sees a great turnout with a number of data professionals talking about various topics. It’s a great opportunity to meet the SQL Server community and network with the folks that help you on the forums and social media. Also, you have a large number of experts under the same roof. So bring your questions.

Session Title: SQL Server Debugging Made Easy using Extended Events
Track: Enterprise Database Administration & Deployment
Level: Intermediate
Time: 1.30PM – 2:45PM
This is a demo filled session that will show you how to debug several common issues in SQL server database engine and identify hotspots as well failures using only extended events. We will showcase how extended events make scenario based troubleshooting easier without having to collect disparate sets of diagnostic data, gather memory dumps and comprise on performance. There have been a plethora of Extended Events which have been added to SQL Server recently based on customer feedback which deprecates the need to run profiler in many commonly encountered situations in production environments. This session covers the new enhancements and capabilities available for Extended Events.

Additional Notes
Understand the diagnostics enhancements in SQL Server database engine Use the diagnostics to troubleshoot and mitigate issues quickly in mission-critical environments Setup session templates proactively to reduce mitigation time during reactive situations.

I will also be delivering  a lightning talk on Enhancements in SQL Server 2016 which will keep your SQL Server environment Always On.

A big thank you to the organizers for giving me the opportunity to speak at the event again and talk about the enhancements that the Tiger team is shipping!


Fluffy in an Availability Group Failover Scenario

Over the past month or so, I have been dealing with a lot of questions around the troubleshooting failover scenarios for Availability Groups. So I decided that it is now time for me to pen down a post on the data to be collected and analysis options for digging into the root cause for an Availability Group. I did have time on my hands and decided to induce a Hollywood element into this post as well. The availability group name that I would be using in this post is named as Fluffy. Fluffy has two secondary Availability Replicas: one synchronous and the other one an asynchronous replica.

As you can see in the screenshot below, I had initiated a failover for my Availability Group and the AlwaysOn
Extended Events sessions shows a state change. The Extended Events session writes to a target file (.xel) which is present in the SQL Server LOG folder.

The Extended Event session runs by default when an Availability Group is configured on the SQL Server instance. The following extended events are captured by the Event Session:

  • sqlserver.alwayson_ddl_executed,
  • sqlserver.availability_group_lease_expired,
  • sqlserver.availability_replica_automatic_failover_validation,
  • sqlserver.availability_replica_manager_state_change,
  • sqlserver.availability_replica_state_change,
  • sqlserver.error_reported

Note that the Extended Events session will only track the state changes for the local replica. The Extended Events session is NOT a global store for all the state change events for all replicas!

The previous set of logs that you collect from the SQL Server failover cluster instances like the SQL Errorlog, Cluster log and Windows Event logs are still applicable for root cause analysis for failovers. However, now you have additional logs in the SQL Server LOG folder which can assist with a root cause analysis for failover issues. The screenshot below shows two new files that would be of interest when analyzing SQL Server failovers namely, the AlwaysOn_health_* and <server name>_<instance name>_SQLDIAG_* logs. The first set of files are the AlwaysOn Extended Events logs and the second set of logs are called the Failover Cluster Instance Diagnostics Log.

We already saw from the above screenshot what the AlwaysOn Extended Events health session can track. Now, let’s see what the Failover Cluster Instance Diagnostics Log collects. There will be multiple informational messages about the activities performed against the Availability Group. Additionally, there will be messages pertaining to the sp_server_diagnostics data (component_health_resultset) collection and the Availability Group state change (availability_group_state_change).

The T-SQL query below can help you fetch the state change information for your SQL Server instance. Again, this is specific to the instance from which you fetched the failover cluster instance diagnostics log:

select object_name,cast(event_data as xml) as xmldata
from sys.fn_xe_file_target_read_file('<file name/path>', null, null, null)
where object_name = 'availability_group_state_change'

A snippet of the XML data retrieved using the above query for the manual failover that I had done is shown below:

<data name=”target_state“>
<data name=”failure condition level“>
<text >SYSTEM_UNHEALTHY</text>

<data name=”availability_group_name”>


In summary, the following sets of logs need to be collected from all the Availability Replicas:

  1. SQL Server Errorlog from the time of the failure
  2. Windows Application and System Event logs from the time of the failure
  3. All the Failover Cluster Instance Diagnostics log (upto a maximum of 10 rollover .xel files by default)
  4. All the AlwaysOn Extended Event session log files (upto a maximum of 4 rollover .xel files by default)
  5. System Health Session Extended Event session files (optional as the component health state information is present in #4)
  6. Windows Cluster log

There are some useful queries in the Books Online topic for the failover cluster instance diagnostics log to parsing through the collected data.

Happy troubleshooting!!

P.S. The above blog post was created using a lab environment provided by SQL Server Virtual Labs. This is an online environment which allows you to create virtual machines to practice various SQL Server scenarios. The lab that I used was “SQL Server 2012: AlwaysOn Availability Groups (SQL 142).