Fluffy in an Availability Group Failover Scenario


Over the past month or so, I have been dealing with a lot of questions around the troubleshooting failover scenarios for Availability Groups. So I decided that it is now time for me to pen down a post on the data to be collected and analysis options for digging into the root cause for an Availability Group. I did have time on my hands and decided to induce a Hollywood element into this post as well. The availability group name that I would be using in this post is named as Fluffy. Fluffy has two secondary Availability Replicas: one synchronous and the other one an asynchronous replica.

As you can see in the screenshot below, I had initiated a failover for my Availability Group and the AlwaysOn
Extended Events sessions shows a state change. The Extended Events session writes to a target file (.xel) which is present in the SQL Server LOG folder.

The Extended Event session runs by default when an Availability Group is configured on the SQL Server instance. The following extended events are captured by the Event Session:

  • sqlserver.alwayson_ddl_executed,
  • sqlserver.availability_group_lease_expired,
  • sqlserver.availability_replica_automatic_failover_validation,
  • sqlserver.availability_replica_manager_state_change,
  • sqlserver.availability_replica_state_change,
  • sqlserver.error_reported

Note that the Extended Events session will only track the state changes for the local replica. The Extended Events session is NOT a global store for all the state change events for all replicas!

The previous set of logs that you collect from the SQL Server failover cluster instances like the SQL Errorlog, Cluster log and Windows Event logs are still applicable for root cause analysis for failovers. However, now you have additional logs in the SQL Server LOG folder which can assist with a root cause analysis for failover issues. The screenshot below shows two new files that would be of interest when analyzing SQL Server failovers namely, the AlwaysOn_health_* and <server name>_<instance name>_SQLDIAG_* logs. The first set of files are the AlwaysOn Extended Events logs and the second set of logs are called the Failover Cluster Instance Diagnostics Log.

We already saw from the above screenshot what the AlwaysOn Extended Events health session can track. Now, let’s see what the Failover Cluster Instance Diagnostics Log collects. There will be multiple informational messages about the activities performed against the Availability Group. Additionally, there will be messages pertaining to the sp_server_diagnostics data (component_health_resultset) collection and the Availability Group state change (availability_group_state_change).

The T-SQL query below can help you fetch the state change information for your SQL Server instance. Again, this is specific to the instance from which you fetched the failover cluster instance diagnostics log:

select object_name,cast(event_data as xml) as xmldata
from sys.fn_xe_file_target_read_file('<file name/path>', null, null, null)
where object_name = 'availability_group_state_change'

A snippet of the XML data retrieved using the above query for the manual failover that I had done is shown below:

<data name=”target_state“>
<
value>2</value>
<text>Online</text>
</data>
<data name=”failure condition level“>
<value>3</value>
<text >SYSTEM_UNHEALTHY</text>
</data>


<data name=”availability_group_name”>
<value>FLUFFY</value>
</data>

</event>


In summary, the following sets of logs need to be collected from all the Availability Replicas:

  1. SQL Server Errorlog from the time of the failure
  2. Windows Application and System Event logs from the time of the failure
  3. All the Failover Cluster Instance Diagnostics log (upto a maximum of 10 rollover .xel files by default)
  4. All the AlwaysOn Extended Event session log files (upto a maximum of 4 rollover .xel files by default)
  5. System Health Session Extended Event session files (optional as the component health state information is present in #4)
  6. Windows Cluster log

There are some useful queries in the Books Online topic for the failover cluster instance diagnostics log to parsing through the collected data.

Happy troubleshooting!!

P.S. The above blog post was created using a lab environment provided by SQL Server Virtual Labs. This is an online environment which allows you to create virtual machines to practice various SQL Server scenarios. The lab that I used was “SQL Server 2012: AlwaysOn Availability Groups (SQL 142).

Advertisements

Awesomesauce: Finding out missing sequences


Another of those #sqlhelp inspired posts. This was around ways to track down missing numbers in a sequence table. This task can be done in multiple ways and as I like to say that there are multiple ways to skin a cat! However, this blog post is about using the new T-SQL enhancements in SQL Server 2012 to figure out missing sequential numbers. I will demonstrate this using an example.

I have a table tblsequences which has two integer columns with the primary key being an int datatype. I did some random inserts into the table. The script below can be used to find out missing sequential numbers in an identity column or an integer column which is supposed to store sequential values.


-- Replace starting value with minimum starting value and increment for your sequence
-- Replace the table name with the table name that you are interested in
declare @startvalue int = 1, @increment int = 1
;with cte as
(
select a,(a-lag(a,1) OVER (ORDER BY a)) as MissingSequences
from tblsequences
)
select a, (MissingSequences/@increment)-1 as MissingSequences
from cte
where MissingSequences > @increment
union all
select TOP 1 MIN (a),
CASE (MIN(a)- @startvalue)/@increment
when 0 then null
else (MIN(a)- @startvalue)/@increment
end as MissingSequences
from tblsequences
group by a
order by a

The output is as shown below in the screenshot. You will notice that the first column reports the primary key value and the MissingSequences column reports the number of missing sequential values lesser than the value in the first column. You will need to make the necessary changes to the two scripts shown in this blog post so that the tables you want analyzed are done so.

image

Next, lets talk about Sequences which is a new feature in SQL Server 2012. I created a new sequence using the a start value of 10 and an increment value of 3.

I used the Sequence which was created for generating OrderID values in two different tables. Now we are posed with an interesting problem. I have two different tables in which I want to compare missing sequence numbers. This can be done with T-SQL code below.


-- Replace schema name, table name(s) and sequence name as appropriate
declare @startvalue int = 1,@interval int = 1, @seqname sysname = 'TestSeq', @schemaname sysname = 'dbo'
select @startvalue = TRY_CAST(TRY_CAST(start_value as varchar(255)) as int),
@interval = TRY_CAST(TRY_CAST(increment as varchar(255)) as int)
from sys.sequences
where name = @seqname and [schema_id] = (select [schema_id] from sys.schemas where name = @schemaname)

if (@startvalue IS NOT NULL and @interval IS NOT NULL)
begin
;with cte as
(
select OrderID,(OrderID-lag(OrderID,1) over (order by OrderID)) as MissingSequences
from
(select OrderId as OrderID
from tblTestSeq
union all
select OrderId as OrderID
from tblTestSeq_2) A
)
select OrderID, (MissingSequences/@interval)-1 as MissingSequences
from cte
where MissingSequences > @interval
union all
select TOP 1 OrderID,
CASE (MIN(OrderID)- @startvalue)/@interval
when 0 then null
else (MIN(OrderID)- @startvalue)/@interval
end as MissingSequences
from tblTestSeq
group by OrderID
order by OrderID
end
else

else
begin
PRINT 'CAST FAILED'
end


The logic used is the same as the logic when we were looking for missing sequences in identity/integer columns described in the first half of the post. However, since sequences can be used for one or more tables, I have used a UNION query to get all the columns together which use the same sequence object. The second result set of the output in the screenshot below shows the tables involved along with the sequential OrderIDs. As you can see below, there are 4sequence numbers missing  10, 13, 16 and 28. The first result set reports exactly that by notifying that there are 3 sequence ids missing lesser than 19 and 1 sequence number missing lesser than 31.

The output of the above T-SQL script is shown in the screenshot below.

image

Another way how T-SQL enhancements in SQL Server 2012 can make your life easier.

Happy T-SQL coding and a have a great weekend!

Get your log shipping restore to run faster


Recently I was working on a customer environment where the requirement was to use a log shipped secondary database for reporting purposes. The catch here was that the restore of all the transaction log backups needed to complete within a fixed time.

Right at the outset, I would like to point out that the restore of transaction logs are dependent on a number of parameters like disk subsystem performance, number of transactions present in the log backup, resource availability on the server where the restore is occurring. Considering that all the factors remain constant, there is a basic difference between a secondary log shipped database in No Recovery and in Standby mode. In Standby mode, a partial recovery is done with the help of a transaction undo file to get the database into a read-only state. This partial recovery operation and undo that occurs while performing the next log restore (with the use of the transaction undo file) requires time and resources and can slow down the time of restore for each transaction log backup that is restored on the secondary database, especially if there are resource contention issues on the server. In this blog post, I shall outline how you can workaround the same in case you have to meet strict restore time SLAs.

Continue reading