← Back to Dashboard

Design 36: Disaster Recovery (ASR)

Summary

This design implements Azure Site Recovery (ASR) as the primary engine for Business Continuity.

Topology: ASR Vault sits in the Secondary Region (West US). It orchestrates replication from the Primary Spoke (East US) to the DR Spoke (West US).

1. Key Design Decisions (ADR)

ADR-01: Tool Selection

  • Decision: Azure Site Recovery.
  • Rationale: Native, simple, cost-effective ($25/instance). Supports VNet-to-VNet replication.

ADR-02: Target

  • Decision: Secondary Region (Paired).
  • Rationale: East US and West US are paired. Azure backbone handles the data transfer efficiently.

2. High-Level Design (HLD)

                               PRIMARY REGION (East US)
                           +--------------------------+
                           |        SPOKE VNet        |
                           |        (Active)          |
                           |   [VM]                   |
                           +------------+-------------+
                                        |
                                        | (Replication Traffic)
                                        v
+-----------------------------------------------------------------------+
| SECONDARY REGION (West US)                                            |
|                                                                       |
|   +-----------------------+       +-----------------------+           |
|   | Recovery Services     |       | DR SPOKE VNet         |           |
|   | Vault                 |<------| (Empty Subnet)        |           |
|   | (Orchestrator)        |       |                       |           |
|   +-----------------------+       +-----------------------+           |
+-----------------------------------------------------------------------+

3. Low-Level Design (LLD)

+-----------------------------------------------------------------------+
| SOURCE: vnet-spoke-east                                               |
|   [VM: web-01]                                                        |
|     |                                                                 |
|     +-- (Mobility Service Agent) --> Sends Data                       |
+-----------------------------------------------------------------------+
                                        |
                                        v
+-----------------------------------------------------------------------+
| TARGET: West US                                                       |
|   [Cache Storage Account] (Staging)                                   |
|                                                                       |
|   [Recovery Services Vault]                                           |
|     |-- Replication Policy: 24hr Retention                            |
|     |-- Failover Plan: Boot Order Group 1                             |
|                                                                       |
|   [vnet-spoke-west]                                                   |
|     |-- (Replica Disk)                                                |
|     +-- (Hydrated VM on Failover)                                     |
+-----------------------------------------------------------------------+

4. Component Rationale

  • Mobility Service: Agent installed on VM to capture writes.
  • Cache Storage: Buffer storage in the source region before sending to target.

5. Strategy: High Availability (HA)

  • N/A: This *is* the DR solution.

6. Strategy: Disaster Recovery (DR)

  • RPO (Recovery Point Objective): Low as 15 seconds.
  • RTO (Recovery Time Objective): Minutes (time to boot VM).

7. Strategy: Backup

  • Distinction: ASR is for *continuity* (hot standby). Backup is for *archival* (cold storage). You need both.

8. Strategy: Security

  • Encryption: Data is encrypted in transit and at rest.

9. Well-Architected Framework Analysis

  • Reliability: Excellent.
  • Security: High.
  • Cost Optimization: High. You don't pay for compute in West US until you failover.
  • Operational Excellence: High. "Test Failover" feature allows non-disruptive drills.
  • Performance Efficiency: N/A.

10. Detailed Traffic Flow

1. Write: VM writes to Disk.

2. Capture: Agent captures write.

3. Send: Sends to Cache Storage (East US).

4. Replicate: ASR moves data to West US.

5. Failover: Admin clicks "Failover".

6. Boot: ASR creates VM in West US, attaches disk, boots it.

11. Runbook: Deployment Guide (Azure Portal)

11. Runbook: Deployment Guide (Azure Portal)

Phase 1: Create Recovery Services Vault

1. Search: "Recovery Services vaults" -> + Create.

2. Resource Group: rg-dr-west.

3. Name: rsv-dr-west.

4. Region: West US (Target Region).

5. Create.

Phase 2: Prepare Replication (Enable ASR)

1. Go to rsv-dr-west -> Site Recovery (Left Menu).

2. Enable replication for Azure virtual machines.

3. Source:

* Location: East US.

* Source subscription: Yours.

* Source Resource Group: rg-spoke-workload (Where your VMs are).

* Next.

4. Virtual Machines:

* Select the VMs you want to protect (e.g., web-01, db-01).

* Next.

5. Replication settings:

* Target location: West US.

* Target subscription: Yours.

* Target resource group: rg-spoke-workload-west (Create new if needed).

* Target virtual network: vnet-spoke-west (Create new if needed).

* Cache storage account: Select one in East US (ASR creates one automatically).

* Next.

6. Review + enable replication.

Phase 3: Monitor Sync

1. Go to Replicated items (Left Menu).

2. You will see your VMs listed.

3. Status:

* Enabling protection (0-10%).

* Synchronizing (Initial replication).

* Protected (Ready).

* *Note: Initial sync can take hours depending on disk size.*

Phase 4: Test Failover (Drill)

1. Click on a Protected VM (e.g., web-01).

2. Test Failover.

3. Recovery Point: Latest processed.

4. Azure VNet: Select vnet-spoke-west.

5. OK.

6. Verify:

* Go to West US Resource Group.

* You will see a new VM web-01-test.

* Login to it (Public IP or via Jumpbox in West).

* Verify app is running.

7. Cleanup test failover:

* Go back to Vault -> Replicated Items -> VM.

* Cleanup test failover.

* Notes: "Test complete".

* OK. (This deletes the test VM).