Performing transaction log backups using AlwaysOn Availability Group read-only secondary replicas-Part2

Describe the log backup process on a ROSR

This is the second post in a planned series of articles relating to SQL Server transaction log backups of databases using a read-only secondary replica (ROSR).  Since the introduction of ROSRs in SQL Server 2012, administrators have been able to perform full database (copy only) backups as well as transaction log backups on any of the secondary replicas within the same Availability Group.  This allows the administrator to reduce or eliminate resource contention between production activity and backups. This article describes the steps and communications between the primary and secondary that take place when the secondary performs a backup.

What is the process for taking a backup on an ROSR?

The following is a high-level description showing the sequence of events that must take place on both the secondary and the primary to complete a successful backup. The valid types of backups that can be taken on an ROSR are: FULL and LOG. For FULL backups, the following are supported when using the COPY_ONLY option: DATABASE, FILE, or FILEGROUP. Differential backups are not supported against secondary replicas.

  1. The secondary first checks to see that the BACKUP command is not inside another transaction.
  2. The secondary then acquires a "backup" lock on the database so other sessions on the secondary cannot do a backup at the same time.
  3. The secondary sends a HADR message (HadrMsgTypeBackupSyncMsg) to the primary for it to request a "backup" lock.
  4. The primary acquires a "backup" lock if it can.
  5. The primary sends a HADR message (HadrMsgTypeBackupInfoMsg ) that includes the backup LSN of where to start (the "first_lsn" stored in msdb.dbo.backupset for this backup).
  6. The secondary performs the requested backup.
  7. The secondary sends a HADR message (HadrMsgTypeBackupInfoMsg ) back to the primary signifying backup completion and notifying the primary of the "last_lsn" backed up (the "last_lsn" stored in msdb.dbo.backupset for this backup).
  8. The primary issues a checkpoint.
  9. The primary releases the "backup" lock.
  10. The primary sends a HADR message (HadrMsgTypeBackupSyncMsg) which will signal the secondary the primary has checkpointed and released its lock.
  11. The secondary releases its "backup" lock.
  12. The secondary updates its MSDB database of the backup.

The next screen shot shows an Excel spreadsheet of the Xevents from both servers listed in order and labelled with the same step numbers as above so you can see in one comprehensive view the sequence of events.

Color Key

Description

Blue Signifies the event was captured on the Primary.
Orange Signifies the event was captured on the Secondary.
Yellow Messages sent from Secondary à Primary (along with the corresponding "recv" on the primary.
Light pink Messages sent from Primary à Secondary (along with the corresponding "recv" on the secondary.
Light blue Locks and checkpoints during the process.
Green TSQL Statements issued: "backup" & "insert"(into the msdb backupset table).

What do the XEvents look like for tracking this process?

The next screen shot is from the Extended Events viewer in SSMS for the Secondary. It shows most of the XEvents captured during the conversation. Some were omitted because they are not covered in this article.

And here we have a screen shot of the Xevents from the Primary's perspective. (Again a few have been filtered out for clarity of this article.)

Here let's focus on a subset of the Xevents that show the request from the secondary to the primary that essentially says – "I want to do a backup. Tell me if it's okay and where to start."

  • You can see the secondary "Send" the HadrMsgTypeBackupSyncMsg (step 3 from above) on the left. Correspondingly you can see the "Recv" of that same message in the primary's Xevents on the right.
  • Then in the blue square boxes you can see the primary on the right "Send" the backup LSN "00000028:00000090:0001", and the secondary "Recv" it on the left (Step 5 from above).
  • Finally, after the secondary completes the backup, in the black ovals we see the secondary "Send" the backup LSN "00000028:00000120:0001" to the primary, and the primary "Recv" it. This is Step 7 above.

Okay, I see the backup LSN go back and forth, how can I relate that to my backups and the LSNs I see in MSDB tables? They don't look the same.

In the Xevent screen shots above we saw two backup LSNs sent back and forth. The first was from the primary to the secondary: "00000028:00000090:0001". The second was from the secondary to the primary indicating where it finished: "00000028:00000120:0001". These two LSNs are the same exact LSNs as can be seen in MSDB.dbo.BackupSet for this backup – except the LSN in the Xevent is HEX based and the one in BackupSet is Decimal based.

This is where part 1 of this blog series comes in handy. If we take the various components of the HEX based LSN and convert them to decimal, we'll see they are the same. For example:

To convert 00000028:00000090:0001 to decimal, take each section and convert to decimal.

HEX (00000028) = 40 in decimal

HEX (00000090) = 144 in decimal – then pad left with 0s until 10 digits wide

HEX (0001) = 1 in decimal – then pad left with 0s until 5 digits wide

Concatenating these together we get one version of the decimal format:

40000000014400001

    Sometimes you will also see the decimal LSN in the following format – still with colon separators:

40:0000000144:00001

In Summary

This article has attempted to demonstrate the steps and communications that take place between the secondary and primary replicas when performing a log backup from the secondary replica.

Coming Up: Part 3: Various transaction log backup scenarios

The next post in this series will cover several scenarios when attempting to do log backups from secondary replicas: multiple secondary attempts to back up the transaction log, backing up the transaction log from a replica that is behind, attempting to back up the same LSNs that have already been backed up elsewhere – and others.

Comments

  • Anonymous
    October 22, 2016
    Great Post Trayce - Thank you for sharing !!Looking forward for 3rd part !
  • Anonymous
    October 28, 2016
    Thank you and you're welcome. Writing it now.
  • Anonymous
    February 25, 2017
    Very informative post!Would love to know when Part 3 is due......
    • Anonymous
      April 03, 2017
      same.. where is the 3rd part!
      • Anonymous
        April 28, 2017
        Should have part 3 out in a few weeks. Lots going on ;-)
  • Anonymous
    April 25, 2017
    Hi Trayce,What happened if we have separate tlog backup on both primary as well as secondary replica? Will their tlog LSN in sequence? or just relative to the local databases?Looking forward to part 3.thanks & regards,Irene
    • Anonymous
      April 28, 2017
      Log backups from any replica (primary or secondary) are part of a single log backup chain. If I backup to LSN 2000 on the primary and then the next successful log backup - wherever it is run, will start with LSN 2001. I say "successful" because if i try to do a log backup on a secondary that is behind -- say it's only at LSN 1800, "it" (i.e. the AG meta data for the backups on the database) will know that the log has already been backed up to LSN 2000 and will error out. So it's not possible to backup the same LSNs from multiple log backups. Also, from the perspective of "same single log chain" -- no matter where the log backup comes from -- they're all needed for restores - to continue the chain. let me know if this is still unclear. feel free to email me at trayce.jordan@microsoft.com.
  • Anonymous
    November 03, 2017
    Here want to know why checkpoint happens end of backup pls