← Back to Dashboard

Design 45: Data Mesh

Summary

This design implements a Data Mesh. Instead of one giant Data Lake, you have domains (Marketing, Sales) managing their own data products.

Topology: A Data Governance Hub (Purview) connects to multiple Data Spokes.

1. Key Design Decisions (ADR)

ADR-01: Decentralization

  • Decision: Domain-Oriented Architecture.
  • Rationale: Removes the bottleneck of a central IT data team.

ADR-02: Governance

  • Decision: Federated Governance (Purview).
  • Rationale: Allows central discovery while keeping data ownership local.

2. High-Level Design (HLD)

+--------------+           +--------------------------+           +--------------+
|  Data User   |           |        HUB VNet          |           |  SPOKE VNet  |
|  (Analyst)   |           |      (Governance)        |           |  (Marketing) |
+------+-------+           +------------+-------------+           +------+-------+
       |                                |                                |
       v                                | (Peering)                      |
+------+-------+                        v                                v
|  Power BI    |           +------------+-------------+           +------+-------+
|  (Report)    |---------->| Azure Purview            |<--------->|  Synapse     |
+--------------+           | (Catalog)                |           |  Workspace   |
                           +--------------------------+           +------+-------+
                                                                         |
                                                                         v
                                                                  +--------------+
                                                                  |  Data Lake   |
                                                                  |  (ADLS Gen2) |
                                                                  +--------------+

3. Low-Level Design (LLD)

                               PRIMARY REGION (East US)
+-----------------------------------------------------------------------+
| HUB VNet: vnet-hub (10.0.0.0/16)                                      |
|   +-----------------------+                                           |
|   | Azure Purview         |                                           |
|   | (Private Endpoint)    |                                           |
|   +-----------|-----------+                                           |
|               |                                                       |
|               v (Peering)                                             |
+---------------|-------------------------------------------------------+
                |
+---------------|-------------------------------------------------------+
| SPOKE VNet: vnet-marketing (10.1.0.0/16)                              |
|   +-----------------------+                                           |
|   | Synapse Workspace     |                                           |
|   | [Spark Pool]          |                                           |
|   | [SQL Pool]            |                                           |
|   +-----------------------+                                           |
|   | Storage Account       |                                           |
|   | [Container: Raw]      |                                           |
|   | [Container: Curated]  |                                           |
|   +-----------------------+                                           |
+-----------------------------------------------------------------------+

                               SECONDARY REGION (West US)
+-----------------------------------------------------------------------+
| DR SPOKE VNet                                                         |
|   +-----------------------+                                           |
|   | Synapse (DR)          |                                           |
|   | (Workspace Only)      |                                           |
|   +-----------------------+                                           |
+-----------------------------------------------------------------------+

4. Component Rationale

  • Purview: The "Google for your Data". Scans all spokes and builds a map.
  • Synapse: The engine that processes data in the spoke.

5. Strategy: High Availability (HA)

  • N/A: Batch processing.

6. Strategy: Disaster Recovery (DR)

  • Implementation: Geo-Redundant Storage.
  • Process:

* Data Lake replicates to West US (GRS).

* Synapse Workspace is stateless (code in Git).

* In disaster, deploy Synapse in West US and point to replicated data.

7. Strategy: Backup

  • Data: Soft Delete + Snapshots on Data Lake.

8. Strategy: Security

  • Access: RBAC on Data Lake folders (ACLs).
  • Network: Private Endpoints for everything.

9. Well-Architected Framework Analysis

  • Reliability: High.
  • Security: High.
  • Cost Optimization: Medium. Synapse is expensive if left running. Pause SQL Pools.
  • Operational Excellence: High.
  • Performance Efficiency: Excellent.

10. Detailed Traffic Flow

1. Ingest: Marketing team drops CSV into Raw container.

2. Process: Synapse Pipeline triggers. Cleans data.

3. Store: Writes clean data to Curated container.

4. Register: Purview scans Curated and adds "Sales Data" to catalog.

5. Consume: Analyst searches Purview, finds "Sales Data", connects Power BI.

11. Runbook: Deployment Guide (Azure Portal)

11. Runbook: Deployment Guide (Azure Portal)

Phase 1: Create Spoke VNet (Marketing Domain)

1. Search: "Virtual networks" -> + Create.

2. Resource Group: rg-marketing.

3. Name: vnet-marketing.

4. Region: East US.

5. Create.

6. Peer to vnet-hub.

Phase 2: Create Data Lake

1. Search: "Storage accounts" -> + Create.

2. Resource Group: rg-marketing.

3. Name: dlsmarketing[uniqueid].

4. Redundancy: GRS (Geo-Redundant).

5. Advanced:

* Hierarchical namespace: Enabled. (Critical).

6. Create.

7. Containers: Create raw and curated.

Phase 3: Create Synapse Workspace

1. Search: "Azure Synapse Analytics" -> + Create.

2. Resource Group: rg-marketing.

3. Workspace name: syn-marketing-[uniqueid].

4. Select Data Lake: dlsmarketing[uniqueid].

5. Create.

Phase 4: Create Purview (Governance Hub)

1. Search: "Microsoft Purview accounts" -> + Create.

2. Resource Group: rg-hub-prod.

3. Name: purview-corp-hub.

4. Create.

Phase 5: Register Source in Purview

1. Open Microsoft Purview Governance Portal.

2. Go to Data Map -> Sources.

3. Register.

4. Select Azure Synapse Analytics.

5. Select syn-marketing-[uniqueid].

6. Register.

7. New Scan:

* Click the source -> New scan.

* Credential: Use Managed Identity.

* Run scan.

8. Wait: Purview will crawl the Synapse workspace and Data Lake.

Phase 6: Verify Catalog

1. Go to Data Catalog (in Purview).

2. Search: "marketing".

3. You should see the tables and files from the Marketing domain.