← Back to Dashboard

Design 25: Cosmos DB (Global Distribution)

Summary

This design implements Azure Cosmos DB as a globally distributed, multi-model database.

Topology: The Cosmos DB account is a global PaaS resource. It is accessed securely via Private Endpoints from the Spoke VNet (where the App resides).

1. Key Design Decisions (ADR)

ADR-01: Consistency Level

  • Decision: Session Consistency (Default).
  • Rationale: Best balance of performance and data accuracy. Users see their own writes immediately.

ADR-02: Connectivity

  • Decision: Private Link.
  • Rationale: Critical for security. Blocks public internet access to the DB.

2. High-Level Design (HLD)

+--------------+           +--------------------------+           +--------------+
|  Global User |           |        HUB VNet          |           |  SPOKE VNet  |
|              |           |      (DNS Resolver)      |           |  (Workload)  |
+------+-------+           +------------+-------------+           +------+-------+
       |                                |                                |
       v                                | (Peering)                      |
+------+-------+                        v                                v
|  Front Door  |           +------------+-------------+           +------+-------+
|  (Routing)   |---------->| Private DNS Zone         |<----------|  App / API   |
+--------------+           | (privatelink.documents)  |           |  (Compute)   |
                           +--------------------------+           +------+-------+
                                                                         |
                                                                         v
                                                                  +--------------+
                                                                  |  Cosmos DB   |
                                                                  |  (Private)   |
                                                                  +--------------+

3. Low-Level Design (LLD)

                               PRIMARY REGION (East US)
+-----------------------------------------------------------------------+
| HUB VNet: vnet-hub (10.0.0.0/16)                                      |
|   +-----------------------+                                           |
|   | Private DNS Zone      |                                           |
|   | (privatelink.documents)|                                          |
|   +-----------|-----------+                                           |
|               |                                                       |
|               v (Peering)                                             |
+---------------|-------------------------------------------------------+
                |
+---------------|-------------------------------------------------------+
| SPOKE VNet: vnet-cosmos-spoke (10.1.0.0/16)                           |
|   +-----------------------+       +-----------------------+           |
|   | Subnet: App           |       | Subnet: PrivateLink   |           |
|   | [Web API VM]          |------>| [Private Endpoint]    |           |
|   | (Processes Request)   |       | (10.1.1.5)            |           |
|   +-----------------------+       +-----------|-----------+           |
+-----------------------------------------------|-----------------------+
                                                |
                                                v
                                    +-----------------------+
                                    | Cosmos DB Account     |
                                    | (Write Region: East)  |
                                    +-----------------------+

                                      |
                                      | (Global Replication)
                                      v

                               SECONDARY REGION (West US)
+-----------------------------------------------------------------------+
| DR SPOKE VNet                                                         |
|   +-----------------------+                                           |
|   | [Web API DR]          |                                           |
|   | (Reads from West)     |                                           |
|   +-----------------------+                                           |
|               |                                                       |
|               v                                                       |
|   +-----------------------+                                           |
|   | Cosmos DB Replica     |                                           |
|   | (Read Region: West)   |                                           |
|   +-----------------------+                                           |
+-----------------------------------------------------------------------+

4. Component Rationale

  • Multi-Region Writes: (Optional) Allows writing to both East and West. Expensive but provides 99.999% SLA.

5. Strategy: High Availability (HA)

  • SLA: 99.99% for single region. 99.999% for multi-region.
  • Failover: Automatic.

6. Strategy: Disaster Recovery (DR)

  • Implementation: Enable Multi-Region.
  • Process:

* Add West US as a read region.

* Enable Service-Managed Failover.

* If East US dies, Azure promotes West US to Write Region automatically.

7. Strategy: Backup

  • Continuous Backup: Point-in-time restore (up to 30 days).
  • Periodic Backup: Old style (every 4 hours). Use Continuous.

8. Strategy: Security

  • Firewall: "Selected Networks" only.
  • RBAC: Use Azure AD RBAC for data plane access (disable Primary Keys if possible).

9. Well-Architected Framework Analysis

  • Reliability: Excellent.
  • Security: High.
  • Cost Optimization: Low. Expensive. Provisioned Throughput (RU/s) costs money even if idle. Use Serverless mode for sporadic workloads.
  • Operational Excellence: High.
  • Performance Efficiency: Excellent. <10ms latency guaranteed.

10. Detailed Traffic Flow

1. User: Sends request to Front Door.

2. Route: Front Door routes to Web API in East US.

3. Process: API processes logic.

4. Query: API queries Cosmos DB via Private Endpoint (10.1.1.5).

5. Replication: Cosmos DB engine replicates data to West US asynchronously.

6. Read: User in West US hits Front Door -> Web API DR -> Reads from West US replica (<10ms latency).

11. Runbook: Deployment Guide (Azure Portal)

11. Runbook: Deployment Guide (Azure Portal)

Phase 1: Create Private DNS Zone (in Hub)

1. Search: "Private DNS zones" -> + Create.

2. Resource Group: rg-hub-dns.

3. Name: privatelink.documents.azure.com (Exact name for Cosmos).

4. Create.

5. Link to Hub:

* Go to Zone -> Virtual network links -> + Add.

* Name: link-to-hub.

* Virtual network: vnet-hub.

* OK.

6. Link to Spoke:

* + Add.

* Name: link-to-spoke.

* Virtual network: vnet-cosmos-spoke.

* OK.

Phase 2: Create Cosmos DB

1. Search: "Azure Cosmos DB" -> + Create.

2. API: Azure Cosmos DB for NoSQL (formerly Core/SQL).

3. Resource Group: rg-design25-cosmos.

4. Account Name: cosmos-global-corp-[uniqueid].

5. Location: East US.

6. Capacity mode: Serverless (Best for labs) or Provisioned.

7. Global Distribution:

* Geo-Redundancy: Enable.

* Multi-region writes: Disable.

8. Networking:

* Connectivity method: Private endpoint.

* Add private endpoint:

* Name: pe-cosmos.

* Subscription/Resource Group: rg-design25-cosmos.

* Location: East US.

* Target sub-resource: Sql (for NoSQL API).

* Virtual Network: vnet-cosmos-spoke.

* Subnet: snet-workload.

* Integrate with private DNS zone: Yes.

* Private DNS Zone: privatelink.documents.azure.com.

9. Review + create -> Create.

Phase 3: Add Region

1. Go to Cosmos Account -> Replicate data globally.

2. Click on the map (e.g., West US).

3. Save. (This starts data replication).

Phase 4: Verify

1. Login to a VM in the Spoke.

2. Nslookup: nslookup cosmos-global-corp-[uniqueid].documents.azure.com.

* Result should be 10.1.x.x (Private IP).

3. Data Explorer:

* Go to Portal -> Data Explorer.

* New Container -> Database ToDoList -> Container Items.

* Add Item: {"id": "1", "task": "Test"}.

4. Verify Replication:

* Wait a few minutes.

* The data is now in West US too.

Phase 2: Add Region

1. Go to Cosmos Account -> Replicate data globally.

2. Click on the map (e.g., West US).

3. Save. (This starts data replication).

Phase 3: Verify

1. Go to Data Explorer.

2. Create a Database ToDoList -> Container Items.

3. Add a document {"id": "1", "task": "Test"}.

4. It is now safely stored and replicated.