← Back to Dashboard

Design 44: Mission Critical (99.999%)

Summary

This design implements a Mission Critical architecture.

Topology: It uses Zone Redundant stamps in the Spoke VNet. It is completely decoupled from the Hub for the data plane (to avoid Hub becoming a SPOF), but uses Hub for management.

1. Key Design Decisions (ADR)

ADR-01: Stamps

  • Decision: Deployment Stamps.
  • Rationale: Each unit (Stamp) is independent. If Stamp A fails, Stamp B takes over.

ADR-02: Global Distribution

  • Decision: Active-Active.
  • Rationale: Run in East US and West US simultaneously.

2. High-Level Design (HLD)

                                  INTERNET
                                     |
                                     v
                           +--------------------+
                           |  Azure Front Door  |
                           |  (Global Router)   |
                           +---------+----------+
                                     |
                  /------------------+------------------\
                  |                                     |
                  v                                     v
        +------------------+                  +------------------+
        | Region A: East US|                  | Region B: West US|
        | (Stamp 1)        |                  | (Stamp 2)        |
        +------------------+                  +------------------+
        |   HUB VNet       |                  |   HUB VNet       |
        +------------------+                  +------------------+
                  |                                     |
        +------------------+                  +------------------+
        |  SPOKE VNet      |                  |  SPOKE VNet      |
        | [AKS Cluster]    |                  | [AKS Cluster]    |
        | [Cosmos DB]      |                  | [Cosmos DB]      |
        +------------------+                  +------------------+

3. Low-Level Design (LLD)

                               PRIMARY REGION (East US)
+-----------------------------------------------------------------------+
| HUB VNet: vnet-hub (10.0.0.0/16)                                      |
|   +-----------------------+                                           |
|   | Bastion Host          |                                           |
|   +-----------|-----------+                                           |
|               |                                                       |
|               v (Peering)                                             |
+---------------|-------------------------------------------------------+
                |
+---------------|-------------------------------------------------------+
| SPOKE VNet: vnet-mission-critical (10.1.0.0/16)                       |
|   +-----------------------+                                           |
|   | Subnet: AKS           |                                           |
|   | [AKS Node (Zone 1)]   |                                           |
|   | [AKS Node (Zone 2)]   |                                           |
|   | [AKS Node (Zone 3)]   |                                           |
|   +-----------------------+                                           |
|   | Subnet: Data          |                                           |
|   | [Cosmos DB Endpoint]  |                                           |
|   +-----------------------+                                           |
+-----------------------------------------------------------------------+

                               SECONDARY REGION (West US)
+-----------------------------------------------------------------------+
| SPOKE VNet: vnet-mission-critical-west                                |
|   +-----------------------+                                           |
|   | [AKS Cluster (Zone 1-3)]                                          |
|   +-----------------------+                                           |
+-----------------------------------------------------------------------+

4. Component Rationale

  • Cosmos DB: The only database that offers 99.999% SLA with multi-region writes.

5. Strategy: High Availability (HA)

  • SLA: 99.999% (Five 9s).
  • Zones: Everything is Zone Redundant.

6. Strategy: Disaster Recovery (DR)

  • Implementation: Active-Active.
  • Process: Zero downtime. If East US disappears, West US handles 100% load instantly.

7. Strategy: Backup

  • N/A: Data is replicated globally. Backup is for corruption, not DR.

8. Strategy: Security

  • DDoS: Standard Protection enabled.
  • WAF: Front Door WAF enabled.

9. Well-Architected Framework Analysis

  • Reliability: Excellent. The highest possible.
  • Security: High.
  • Cost Optimization: Low. Very expensive. You pay for double capacity.
  • Operational Excellence: High. Automated deployments (Terraform).
  • Performance Efficiency: Excellent.

10. Detailed Traffic Flow

1. User: Hits Front Door.

2. Route: Routed to closest region (East US).

3. Ingress: Hits AKS Ingress Controller.

4. App: Pod processes request.

5. Data: Writes to local Cosmos DB.

6. Sync: Cosmos DB syncs to West US.

11. Runbook: Deployment Guide (Azure Portal)

11. Runbook: Deployment Guide (Azure Portal)

Phase 1: Deploy Stamps (AKS Clusters)

1. East US:

* Create AKS aks-east.

* Enable Availability Zones: 1, 2, 3.

* Enable Cluster Autoscaler.

2. West US:

* Create AKS aks-west.

* Enable Availability Zones: 1, 2, 3.

Phase 2: Deploy Global Data (Cosmos DB)

1. Search: "Azure Cosmos DB" -> + Create.

2. API: NoSQL.

3. Name: cosmos-global-mission.

4. Region: East US.

5. Create.

6. Replicate:

* Go to Replicate data globally.

* Add West US.

* Multi-region writes: Enable. (Critical for 99.999%).

* Save.

Phase 3: Deploy Global Router (Front Door)

1. Search: "Front Door and CDN profiles" -> + Create.

2. Tier: Premium (Recommended) or Standard.

3. Endpoint: mission-critical-app.

4. Origin Group:

* Add Origin 1: aks-east (Public IP of Ingress Controller).

* Add Origin 2: aks-west (Public IP of Ingress Controller).

* Health Probes: HEAD /healthz.

5. Create.

Phase 4: Deploy Application

1. Deploy your App to aks-east. Configure it to connect to cosmos-global-mission (East endpoint).

2. Deploy your App to aks-west. Configure it to connect to cosmos-global-mission (West endpoint).

Phase 5: Verify Failover

1. Browser: Hit Front Door URL. It serves from East (lowest latency).

2. Simulate Failure: Stop the AKS Nodes in East US.

3. Refresh: Front Door detects failure (missed probes) and routes to West US.

4. Data: Check Cosmos DB. Data written in East is visible in West.