Killing two birds with one stone: SharePoint HA and DR with stretch farm, and everything you want to know about it

A single SharePoint farm running across multiple data centers is called a “stretch farm”. Using a stretched SharePoint farm, you can provide fault tolerance by following the standard guidance for making databases and service applications redundant. Therefore you can achieve high availability and disaster recovery at the same time.

You can follow this articles to set up a stretch farm using SQL mirroring: https://technet.microsoft.com/en-us/library/dd207314.aspx

Following Q&As are based on my real-world experiences with SQL mirror and SharePoint stretch farm. 

Questions and Answers (DC stands for Data Center):  

  • What are the requirements for setting up a SharePoint stretch farm?
    For a stretched farm to work, there must be less than 1 millisecond latency between all the SQL Servers and the front-end Web servers in one direction, and at least 1 gigabit per second bandwidth.

  • Where to put the witness server?
    Put the witness server in the most reliable DC. If the secondary DC is more reliable and the connection between the DCs is at least as reliable as the DCs, you can put the witness in the secondary DC, otherwise, it should be in the primary DC. 

  • What is the minimum firewall requirements for the witness server?
    Open up the TCP port of the SQL mirroring endpoint for inbound traffic. 

  • Can I use SQL Express as the witness?
    Yes, you can.

  • How to test a stretch farm?
    You can simulate a failed SQL server by stopping its service using SQL Management Studio. Open the studio, connect to the SQL instance and right click on the instance name on the left pane and select Stop command. If the SQL instance is the principal, the mirror SQL instace will take over the principal role and the SharePoint farm will come back to service within a few seconds.

  • How to manually failover from one DC to another DC?
    Run following T-SQL command on the principal SQL server instance. You need to run the command for each database.

    ALTER DATABASE <your database name> SET PARTNER FAILOVER

  • What to do when I lose the primary DC?
    When you have the witness server in primary DC and the whole DC is gone (disaster happens), the mirror SQL instance will become "Principal - In Recovery" but won't serve data. To bring it back online so your SharePoint services resume, you need to run following T-SQL commands. You have to do it for all of your databases.

    ALTER DATABASE <your database name> SET PARTNER OFF

    RESTORE DATABASE <your database name> WITH RECOVERY

    Above commands break the mirroring partnership. You have to backup/restore the databases and resume the partnership after the primary DC is recovered.
    If the witness server is in the secondary DC, the mirror SQL instance will automatically become the principal server, your SharePoint farm will resume services in a few seconds.

  • What to do when the primary DC is recovered after a disaster?
    Assume you already ran the commands to break the mirror partnership. You have to follow the steps to resume it. Note the principal SQL instance is in the secondary DC at this point:
    1. On the principal server, back up all SharePoint databases.
    2. Copy the backup files to the mirror database server. The server should be running on stand-alone mode.
    3. Delete all the SharePoint databases from the mirror database server if they are still present
    4. Restore the databases to the mirror server.
    5. On the mirror server, set up the mirroring partnership.
    6. On the principal server, set up the mirroring partnership.
    7. On the principal server, set up the witness partnership.
    8. Test SharePoint and make sure it is still functioning.
    9. Failover to the original principal server if necessary.
    If the SQL instance in the secondary DC automatically took over the principal role, you don't have to do anything but just give them some time to sync up.

  • What happens if I lose connection between the primary and the secondary data centers?
    Assume both the DCs are still working but the connection between them is broken. Both the SQL instances will become principal. The one with access to the witness server will be serving data, the one without access to the witness will stop serving data and the SharePoint servers in that DC will stop working.

  • Can SQL mirroring work with cluster?
    Mirroring can work between SQL clusters, and a SQL cluster and a single server SQL instance. Mirroring, however, does not work within a SQL cluster.

  • What happens when my principal cluster fails over?
    SQL cluster failover always takes longer time than SQL mirroring witness interval. Therefore, SQL mirroring roles will be switched when the principal cluster fails over. You can failover back to the original cluster after SQL cluster failover is complete.

  • Does stretch farm affect SharePoint performance?
    According to my limited performance tests, no.

  • How do I set up failover for SharePoint features that do not have UI to set up failover instances?
    You can run following PowerShell commands in SharePoint PowerShell console on a SharePoint server:

    $db = get-spdatabase | where {$_.Name -eq "<database name>"}

    $db.AddFailoverServiceInstance("<failover SQL instance name>")

    $db.Update()

  • Does stretch farm protects all SharePoint functionalities?
    No. It does not protect functionalities with dependencies beyond 14-hive and databases. Some examples of the not supported dependencies are file shares, and external data sources. Some 3rd party SharePoint solution without failover capabilities will also not be protected. Contact your vendors for more information.

Zewei Song, Ph.D.
MCPD, MCITP, MCTS: SharePoint 2010, .NET 3.5
Enterprise Services, Microsoft Corporation

Comments

  • Anonymous
    July 27, 2011
    Would you have a step-by-step instructions for setting up SP HA & DR?

  • Anonymous
    July 28, 2011
    Please refer to this article for the detailed steps: technet.microsoft.com/.../ff628962.aspx Z

  • Anonymous
    August 20, 2011
    Thanks for sharing such a nice article.

  • Anonymous
    August 24, 2011
    Very nice summary. One thing I've realized is that the 1ms latency requirement puts an absolute distance limit of 186 miles between data centers (and practically much less than that - figure real world more than 100 miles is going to break you). The problem, of course, is that using two data centers within 100 miles of each other for HA & DR isn't really a best practice - they will have the same weather problems, may be on the same power grid, etc. Any thoughts about how useful this practice really is, given this limitation?

  • Anonymous
    August 25, 2011
    True. However, I have seen customers using this solution with data centers within 20, 10 and even 2 miles. It is not always feasible to have real geo-distributed DR locations.

  • Anonymous
    August 28, 2011
    Hi I want to create stretch farm , can you please put more details on WFE and application server end. Suppose I have 2 WFE and 1 Application,1 index and server same need to configure in secondary data center how to configure it ? how to configure another application and index server in secondary datacenter. Thanks

  • Anonymous
    August 30, 2011
    Excellent article.  Your real-world experience is another inspiration for me to proceed with the plan to use the stretch farm configuration for my DR plan.  I've tried (repeatedly) testing this setup using a multi-servers farm staging environment and a single-server development environment (as the mirror) and was not sucessful.  After the (simulated) primary Data Center failed, I get "unable to connect to the configuration database".   I suspect it may be due to the mirror environment being a single-server farm?  Would appreciate any feedbacks and/or instructions.  Thanks.

  • Anonymous
    September 06, 2011
    Hi Santosh Kanse, you can configure WFE and other SP servers in the DR datacenter however you like. For example, you can have 1 WFE and 1 App in the DR DC, or make it the same as your Production DC. For search, you can configure Query servers in the DR DC so in case of emergency, your users can still run search queries. It really depends on your DR requirements. Dr. Z

  • Anonymous
    September 06, 2011
    Hi Version2zero, I am not very clear on how you set up your farm. But if you are seeing the "unable to connect to the config db" error, then your SharePoint servers do not have access to the SQL. Also, please note this solution is to configure ONE farm across datacenters, not to configure multiple farms. Dr. Z

  • Anonymous
    September 07, 2011
    Hi Dr. Z , Thanks a lot for you reply, I have one query here that , what abt application server ,

  1. can i create multiple Central admin site using diffrent port i.e  Central Admin site at DC1 at Port 8080 other Cental admin site at DC2 at port 8081 so that is DC1 down then Administrator can access CA from DC2 server , or any other things i need to do.
  2. What abt the services application is same sevicess Applicaitons i need to configure on DC2 Application server which is configured on DC1  Application server or only once i need to configure services. pls clarify. Thanks for your time. Santosh K.
  • Anonymous
    September 08, 2011
    Hello Dr. Z. I have the following set up for DR.
  • a STG environment with 3 servers (2 WFEs + 1 App Server + 1 db server)
  • a STG-DR environnment with identical configuration (2 WFEs + 1 App Server + 1 db server)
  • installed SharePoint and create a vanilla SP farm on STG
  • Joined the WFEs on STG-DR to the SharePoint farm on STG, making it a stretch farm
  • then mirrored all SP databases between STG and STG-DR.
  • confirm the SP farm is up and running
  • then run Powershell script to set the Failoverserver = the STG-DR db server.
  • Failed over the dbs and shut down STG so STG-DR is now my primary when i browse, SP did not come up, i got a 500 error. I ran SP Configuration Wizard on each of the WFE and see that the name of the db server is still the name of the STG environment's db server (which is now shut down) i then changed it to the name of the db server on STG-DR, browse SP whicht gave me another error: "cannot connect to the configuration database" Am i missing something? Thanks for your response.
  1. can i create multiple Central admin site using diffrent port i.e  Central Admin site at DC1 at Port 8080 other Cental admin site at DC2 at port 8081 so that is DC1 down then Administrator can access CA from DC2 server , or any other things i need to do. ANSWER: you can only have one CA per SharePoint farm. In case of stretched farm, since config and CA content is mirrored, you are safe when the original CA is blown away. You can use SharePoint Config Wizard to change CA's host, or you can use "New-SPCentralAdministration" (technet.microsoft.com/.../ff607841.aspx) command to do it.
  2. What abt the services application is same sevicess Applicaitons i need to configure on DC2 Application server which is configured on DC1  Application server or only once i need to configure services. pls clarify. ANSWER: most SharePoint service applications can be running on multiple servers at the same time, the Topology service is in charge of load balancing them, so most of them are safe and you don't have to do anything when you lost some servers in one DC. However, for Search and Profile, you may need to change their topology settings (Search) and/or provision new services (Profile and Profile Sync).
  • Anonymous
    October 04, 2011
    Hi Version2Zero, seems to me that there are may be some networking issues. Please make sure that from you STG env. you have access to STG-DR database servers and vice versa. If there are firewalls between them, please refer to this article for the list of the ports and protocols you need to open: technet.microsoft.com/.../cc262849.aspx Good luck!

  • Anonymous
    November 16, 2011
    Hi Dr. Z Thanks for your reply. I configured the farm as you said.

  • Anonymous
    May 03, 2012
    just want to setup like above link. would you please guide me what could be an issue / impact to setup this. Any consideration to take place in terms of network, bandwidth, any recommendation. my data centre is geographically distributed.

  • Anonymous
    May 03, 2012
    Make sure you have bandwidth larger than 1Gb and latency less than 1ms one way.

  • Anonymous
    January 30, 2015
    Hello Dr Z, I would like to understand any performance impact when using stretched Farm across 2 data centers in a geographically dispersed environment.  With synchronous mirroring the log has to be hardened at the secondary data center and this could possible impact the primary data center even with > 1G bandwidth when there are lot of data to be committed to the disk on the primary site. Would you still think stretched farm setup for such a situation is ideal for DR and HA? Thanks for your time.