This design implements a Data Mesh. Instead of one giant Data Lake, you have domains (Marketing, Sales) managing their own data products.
Topology: A Data Governance Hub (Purview) connects to multiple Data Spokes.
+--------------+ +--------------------------+ +--------------+
| Data User | | HUB VNet | | SPOKE VNet |
| (Analyst) | | (Governance) | | (Marketing) |
+------+-------+ +------------+-------------+ +------+-------+
| | |
v | (Peering) |
+------+-------+ v v
| Power BI | +------------+-------------+ +------+-------+
| (Report) |---------->| Azure Purview |<--------->| Synapse |
+--------------+ | (Catalog) | | Workspace |
+--------------------------+ +------+-------+
|
v
+--------------+
| Data Lake |
| (ADLS Gen2) |
+--------------+
PRIMARY REGION (East US)
+-----------------------------------------------------------------------+
| HUB VNet: vnet-hub (10.0.0.0/16) |
| +-----------------------+ |
| | Azure Purview | |
| | (Private Endpoint) | |
| +-----------|-----------+ |
| | |
| v (Peering) |
+---------------|-------------------------------------------------------+
|
+---------------|-------------------------------------------------------+
| SPOKE VNet: vnet-marketing (10.1.0.0/16) |
| +-----------------------+ |
| | Synapse Workspace | |
| | [Spark Pool] | |
| | [SQL Pool] | |
| +-----------------------+ |
| | Storage Account | |
| | [Container: Raw] | |
| | [Container: Curated] | |
| +-----------------------+ |
+-----------------------------------------------------------------------+
SECONDARY REGION (West US)
+-----------------------------------------------------------------------+
| DR SPOKE VNet |
| +-----------------------+ |
| | Synapse (DR) | |
| | (Workspace Only) | |
| +-----------------------+ |
+-----------------------------------------------------------------------+
* Data Lake replicates to West US (GRS).
* Synapse Workspace is stateless (code in Git).
* In disaster, deploy Synapse in West US and point to replicated data.
1. Ingest: Marketing team drops CSV into Raw container.
2. Process: Synapse Pipeline triggers. Cleans data.
3. Store: Writes clean data to Curated container.
4. Register: Purview scans Curated and adds "Sales Data" to catalog.
5. Consume: Analyst searches Purview, finds "Sales Data", connects Power BI.
1. Search: "Virtual networks" -> + Create.
2. Resource Group: rg-marketing.
3. Name: vnet-marketing.
4. Region: East US.
5. Create.
6. Peer to vnet-hub.
1. Search: "Storage accounts" -> + Create.
2. Resource Group: rg-marketing.
3. Name: dlsmarketing[uniqueid].
4. Redundancy: GRS (Geo-Redundant).
5. Advanced:
* Hierarchical namespace: Enabled. (Critical).
6. Create.
7. Containers: Create raw and curated.
1. Search: "Azure Synapse Analytics" -> + Create.
2. Resource Group: rg-marketing.
3. Workspace name: syn-marketing-[uniqueid].
4. Select Data Lake: dlsmarketing[uniqueid].
5. Create.
1. Search: "Microsoft Purview accounts" -> + Create.
2. Resource Group: rg-hub-prod.
3. Name: purview-corp-hub.
4. Create.
1. Open Microsoft Purview Governance Portal.
2. Go to Data Map -> Sources.
3. Register.
4. Select Azure Synapse Analytics.
5. Select syn-marketing-[uniqueid].
6. Register.
7. New Scan:
* Click the source -> New scan.
* Credential: Use Managed Identity.
* Run scan.
8. Wait: Purview will crawl the Synapse workspace and Data Lake.
1. Go to Data Catalog (in Purview).
2. Search: "marketing".
3. You should see the tables and files from the Marketing domain.