Synapse

  +============================================================+
  |                                                            |
  |      AZURE ANALYTICS FUNDAMENTALS - PART 2                 |
  |                                                            |
  |  Data Lakes  -  Data Warehouses  -  Modern Architecture    |
  |                                                            |
  |                 Powered by HitaVir Tech                    |
  +============================================================+

Welcome to Fundamentals of Analytics on Azure Cloud Platform - Part 2 by HitaVir Tech!

In Part 1 you built the mental model โ€” analytics concepts, the 5 Vs, and the Azure services that solve each V. In Part 2 you will zoom out and learn how those services combine into production architectures used by real companies today.

Where Part 1 Ends and Part 2 Begins

Part 1 โ€” The Ingredients

Part 2 โ€” The Recipe

๐Ÿง  What is analytics / ML?

๐Ÿ›๏ธ What is a data warehouse?

๐Ÿ“ The 5 Vs diagnostic

๐Ÿชฃ What is a data lake?

โ˜๏ธ One service per V

๐Ÿงฉ How they combine into a Lakehouse

๐Ÿ› ๏ธ ADLS โ†’ Synapse Serverless mini lab

๐Ÿ—บ๏ธ Reference architectures for 6 real use cases

What You Will Master

Pillar

Topics

๐Ÿชฃ Data Lakes

What, why, zones, governance on Azure

๐Ÿ›๏ธ Data Warehouses

Columnar MPP, star schemas, Synapse Dedicated

๐Ÿก Modern Data Architecture

Lakehouse + Microsoft Fabric

โ˜๏ธ Azure Services

ADLS Gen2, Synapse, Databricks, Purview, ADF, Event Hubs, Power BI

๐Ÿ—บ๏ธ Reference Architectures

Batch BI, Streaming, ML, Log analytics, 360ยฐ customer, Data mesh

๐ŸŽฏ Common Use Cases

When to pick which pattern

Services You Will Meet in Part 2

ADLS Gen2 Synapse Databricks Data Factory Event Hubs Stream Analytics Data Explorer Power BI ML OpenAI AI Search

Why Architecture Matters More Than Services

  +================================================================+
  |                                                                |
  |   Services are Lego bricks.   Architecture is the castle.      |
  |                                                                |
  |   Any junior can spin up ADLS + Synapse.                       |
  |   Seniors know WHEN to use which, WHY, and HOW they join.      |
  |                                                                |
  +================================================================+

Estimated Duration

2-3 hours (concept-heavy, no new hands-on required โ€” uses Part 1 lab as the anchor)

How to Use This Codelab

If you are...

Do this

๐ŸŽ“ A student new to cloud

Read top-to-bottom; pause at each reference architecture

๐Ÿ› ๏ธ A working engineer

Skim sections 1-3, deep-read the reference architectures for patterns you ship

๐Ÿ—๏ธ A solution architect

Use the reference diagrams as whiteboard starters with stakeholders

๐Ÿ”– A reference reader

Jump to the Quiz, the Cheat Sheet, and the Appendix of Resources

๐Ÿ’ก HitaVir Tech says: "Services change names every few years โ€” HDInsight became Databricks, SQL DW became Synapse, Synapse is becoming Fabric. But the shapes of data architectures stay stable for decades. Master the shapes. You will pick up the services in a week."

What You Should Know

Required

Helpful

Mental Model You Already Have (From Part 1)

ADLS Gen2 Synapse Databricks Data Factory AI Vision Event Hubs Stream Analytics Functions Defender Power BI ML OpenAI

  +--------------------------------------------------------------+
  |   5 Vs framework          Azure service toolkit              |
  |   -------------------     ------------------------           |
  |   Volume                  ADLS Gen2, Synapse, Databricks     |
  |   Variety                 ADF, Synapse, AI Vision, Doc Intel.|
  |   Velocity                Event Hubs, Stream Analytics, Fn   |
  |   Veracity                ADF flows, Purview, Defender       |
  |   Value                   Power BI, Azure ML, OpenAI         |
  +--------------------------------------------------------------+

In Part 2 we compose these services into proven shapes.

No Paid Resources Required

Part 2 is concept-heavy. Every diagram is annotated with the services you already met in Part 1. The Part 1 hands-on lab is the practical anchor โ€” this codelab teaches the architectures that scale it up.

โš ๏ธ If you choose to experiment with Synapse Dedicated Pools or Databricks: they can exit the free tier quickly. Use serverless modes and delete resource groups the same day.

  +==============================================================+
  |         SECTION  1  -  ARCHITECTURES (the big three)         |
  +==============================================================+

The Three Architecture Icons

ADLS Gen2 Synapse Databricks

Three architecture patterns power 95% of modern analytics in production:

                    +----------------------+
                    |  1.  DATA LAKE       |
                    |  Store anything,     |
                    |  cheap and forever   |
                    +----------+-----------+
                               |
                               | grew alongside
                               v
                    +----------------------+
                    |  2.  DATA WAREHOUSE  |
                    |  Fast SQL on         |
                    |  curated tables      |
                    +----------+-----------+
                               |
                               | combined into
                               v
                    +----------------------+
                    |  3.  LAKEHOUSE       |
                    |  (Modern Data Arch.) |
                    |  Best of both        |
                    +----------------------+

Part 2 tours each architecture, shows the Azure services that implement it, then demonstrates how real companies blend all three for different use cases.

  +==============================================================+
  |              ARCHITECTURE  1   -   DATA  LAKE                |
  |              "Store first, schema later."                    |
  +==============================================================+

ADLS Gen2 Policy Data Shares

What is a Data Lake?

๐Ÿชฃ A data lake is a centralized repository that stores any type of data โ€” structured, semi-structured, unstructured โ€” at any scale, in its native format, typically on cheap object storage like Azure Data Lake Storage Gen2.

The defining move: you ingest now and decide the schema later (called schema-on-read). Contrast with warehouses, which demand schema-on-write.

Data Lake โ€” The Core Idea

  +------------------------------------------------------------+
  |                                                            |
  |   ANY DATA  --->   ADLS Gen2 (object store)   --->  ENGINES|
  |                                                            |
  |   CSV, JSON,        cheap, durable,          Synapse,      |
  |   Parquet, Delta,   infinite scale,          Databricks,   |
  |   logs, images,     one source of truth      HDInsight,    |
  |   Kafka events                               Power BI, ML  |
  |                                                            |
  +------------------------------------------------------------+

One lake. Many engines. That is the core promise.

Why Data Lakes Emerged

Before ~2010, analytics meant warehouses โ€” expensive, schema-strict, row-limited. Then data exploded:

Problem With Warehouse-Only World

Who Felt It

๐Ÿ’ธ Warehouse storage cost $1000s / TB / month

Every CFO

๐Ÿšซ Could not store PDFs, images, videos

Healthcare, retail, media

๐Ÿข Schema changes took weeks

Fast-moving startups

โ›” Historical data deleted to save cost

Regulated industries

Data lakes fixed this by leveraging cheap object storage (ADLS at ~$0.018 / GB / month) and decoupling compute from storage.

Data Lake Zones โ€” The Medallion Pattern

  abfss://lake@hitavirtech.dfs.core.windows.net/
    |
    +-- raw/          <--  BRONZE:  untouched, as ingested
    |                      + source of truth
    |                      + can replay anything from here
    |
    +-- curated/      <--  SILVER:  cleaned, typed, Delta/Parquet
    |                      + deduped, quality-checked
    |                      + partitioned for fast scans
    |
    +-- analytics/    <--  GOLD:    pre-aggregated, BI-ready
                           + joins done once
                           + powers Power BI and ML features

Zone

Icon

Shape

Readers

Raw / Bronze

๐Ÿฅ‰

Original bytes โ€” CSV, JSON, images, dumps

Data engineers only

Curated / Silver

๐Ÿฅˆ

Cleaned, typed, often Delta + partitions

Analysts, ML engineers

Analytics / Gold

๐Ÿฅ‡

Aggregated, ready for Power BI and models

Business users, BI tools

Delta Lake โ€” The ACID Layer on Your Data Lake

  +--------------------------------------------------------------+
  |  DELTA  LAKE  -  ACID Transactions on ADLS                   |
  +--------------------------------------------------------------+
  |  Format    :  Parquet files + JSON transaction log           |
  |  Superpower:  UPDATE, DELETE, MERGE on lake files            |
  |  Travel    :  Time travel (query any past version)           |
  |  Used by   :  Databricks, Synapse Spark, Microsoft Fabric    |
  |                                                              |
  |  Solves the "warehouse features on lake files" problem.      |
  +--------------------------------------------------------------+

Data Lake on Azure โ€” The Service Stack

ADLS Gen2 Policy Data Factory Synapse ML

Layer

Icon

Purpose

Service

Storage

๐Ÿชฃ

Raw bytes, infinite scale

ADLS Gen2

Governance

๐Ÿ”

Permissions, catalog, lineage

Microsoft Purview

Cataloging

๐Ÿ“š

Schema + lineage

Purview + Synapse Catalog

ETL / ELT

๐Ÿ•ธ๏ธ

Move raw โ†’ curated โ†’ analytics

Azure Data Factory, Synapse Pipelines, Databricks

Query

๐Ÿ”

SQL on lake files

Synapse Serverless SQL

ML

๐Ÿค–

Train on lake data directly

Azure Machine Learning

Service Spotlight โ€” Microsoft Purview

Purview Info Protection Policy

  +--------------------------------------------------------------+
  |  MICROSOFT  PURVIEW  -  Unified Data Governance              |
  +--------------------------------------------------------------+
  |  Catalog   :  Scan ADLS, Synapse, SQL, Power BI, S3, GCS     |
  |  Discovery :  Auto-classify sensitive data (500+ types)      |
  |  Lineage   :  End-to-end, column-level, across services      |
  |  Policy    :  Data access governance across clouds           |
  |                                                              |
  |  Turns a raw ADLS account into a governed, multi-tenant lake.|
  +--------------------------------------------------------------+

Purview is what lets one ADLS account serve 20 teams without everyone seeing everyone else's columns.

Data Lake Strengths and Weaknesses

Strength

Icon

Weakness

Icon

Cheap per GB

๐Ÿ’ฐ

Can become a "data swamp" without governance

๐ŸŠ

Any format

๐Ÿงฉ

Query performance < a warehouse on the same data

๐Ÿข

Separates storage and compute

๐Ÿ”€

Schema enforcement is optional (and often skipped)

๐Ÿซฅ

Multi-engine access (Synapse, Databricks, Power BI, ML)

๐ŸŒ

Harder for business users to self-serve

๐Ÿ˜ต

The Data Swamp โ€” How Lakes Fail

  +--------------------------------------------------------------+
  |                                                              |
  |   NO CATALOG         ->  "Which container has customers?"    |
  |   NO QUALITY RULES   ->  "Why are 40% of amounts negative?"  |
  |   NO GOVERNANCE      ->  "Who deleted last quarter's data?"  |
  |   NO LIFECYCLE       ->  "We're paying for 2014 clickstream" |
  |                                                              |
  |             ==>  DATA SWAMP  (useless, expensive)            |
  +--------------------------------------------------------------+

Every successful data lake is paired with Purview + quality rules + lifecycle policies + Azure Policy. Skip these and your lake drowns.

๐Ÿ’ก HitaVir Tech says: "A data lake without a catalog is a data swamp. A data lake without quality rules is a liability. Governance is not optional โ€” it is the difference between an asset and a landfill."

๐Ÿชฃ Data lake in one line: store everything cheaply, govern it strictly, query it from any engine.

  +==============================================================+
  |           ARCHITECTURE  2   -   DATA  WAREHOUSE              |
  |           "Fast SQL on curated, trusted data."               |
  +==============================================================+

Synapse SQL DW Analysis Services

What is a Data Warehouse?

๐Ÿ›๏ธ A data warehouse is a centralized, highly-structured database optimized for analytical queries โ€” aggregations, joins, and scans across billions of rows โ€” at interactive speeds.

Key properties:

Property

Icon

What It Means

Schema-on-write

๐Ÿ“

Every row fits a predefined schema at load time

Columnar storage

๐Ÿ“Š

Stores columns together, not rows โ€” 10-100ร— faster scans

MPP (Massively Parallel Processing)

โšก

Splits work across many compute nodes automatically

Optimized for reads

๐Ÿ“–

Writes are slower; reads are lightning-fast

Business-user friendly

๐Ÿ‘ฅ

Clean star schemas; analysts can self-serve SQL

Row vs Columnar Storage

  ROW STORE (OLTP, e.g. Azure SQL DB)
  -----------------------------------
  [id | name | country | amount]  <-- each row stored together

  Great for:  "Get everything about order 1042"
  Bad for:    "SUM(amount) across 1B rows"

  COLUMNAR STORE (OLAP, e.g. Synapse Dedicated, Parquet)
  ------------------------------------------------------
  [id][id][id]...
  [name][name][name]...
  [country][country][country]...
  [amount][amount][amount]...      <-- each column stored together

  Great for:  "SUM(amount) across 1B rows"  (scan only one column)
  Bad for:    "Get everything about order 1042"

Warehouses use columnar. That one design choice is why they can aggregate billions of rows in seconds.

Star Schema โ€” The Warehouse Language

Most warehouse tables follow the star schema:

                    +---------------------+
                    |   DIM_CUSTOMER      |
                    |   (who bought)      |
                    +----------+----------+
                               |
                               |
  +---------------+    +---------------+    +---------------+
  | DIM_PRODUCT   |----|  FACT_SALES   |----|  DIM_DATE     |
  | (what sold)   |    |  (the event)  |    | (when sold)   |
  +---------------+    +-------+-------+    +---------------+
                               |
                               |
                    +----------+----------+
                    |   DIM_STORE         |
                    |   (where sold)      |
                    +---------------------+

Star schemas make queries fast AND readable: SELECT country, SUM(amount) FROM fact_sales JOIN dim_store ....

Service Spotlight โ€” Azure Synapse Dedicated SQL Pool

Synapse

  +--------------------------------------------------------------+
  |  SYNAPSE  DEDICATED  SQL  POOL  (formerly SQL DW)            |
  +--------------------------------------------------------------+
  |  Category  :  Columnar MPP warehouse                         |
  |  Distrib.  :  Round-robin, hash, replicated                  |
  |  Scale     :  60-to-hundreds of DWUs, petabyte range         |
  |  SQL       :  T-SQL (SQL Server flavored)                    |
  |  Pairings  :  PolyBase, COPY statement, Power BI             |
  |                                                              |
  |  The engine behind Mars, Rolls-Royce, Walmart analytics.     |
  +--------------------------------------------------------------+

Synapse Serverless SQL โ€” Warehouse Queries Over the Lake

                 +----------------------------+
                 |  Dedicated SQL Pool         |
                 |  (hot, curated tables)     |
                 +-------------+--------------+
                               |
                               | joins across
                               v
                 +----------------------------+
                 |  ADLS via Serverless SQL   |
                 |  (cold, historical data)   |
                 +----------------------------+

One T-SQL query spans both the warehouse (recent, hot) and the lake (years of history). No duplicate storage, no duplicate pipelines.

Warehouse Loading Patterns

Source

Icon

Loader

Speed

ADLS files

๐Ÿชฃ

COPY statement (parallel)

๐Ÿš€

Event Hubs

๐ŸŒŠ

Synapse streaming ingestion

๐Ÿš€

Azure SQL / Postgres

๐Ÿ—„๏ธ

Azure Data Factory + CDC

๐Ÿƒ

SaaS apps (Dynamics, Salesforce)

โ˜๏ธ

Data Factory connectors / Power Automate

๐Ÿšถ

Data Warehouse Strengths and Weaknesses

Strength

Icon

Weakness

Icon

Sub-second SQL on billions of rows

โšก

Expensive per TB stored

๐Ÿ’ธ

Business-analyst friendly

๐Ÿ‘ฅ

Rigid schema โ€” changes need migrations

๐Ÿ”’

Mature BI tool ecosystem (Power BI)

๐Ÿ“Š

Only handles structured data

๐Ÿ“‹

ACID transactions and governance baked-in

๐Ÿ›ก๏ธ

Locked into one vendor's engine

๐Ÿ”—

Lake vs Warehouse โ€” The Canonical Comparison

  +--------------------+------------------------+------------------------+
  |  ATTRIBUTE         |  DATA  LAKE            |  DATA  WAREHOUSE       |
  +--------------------+------------------------+------------------------+
  |  Data type         |  Anything              |  Structured only       |
  |  Schema            |  On read               |  On write              |
  |  Cost / TB stored  |  $  (cheap)            |  $$$$  (expensive)     |
  |  Query speed       |  Medium                |  Fast                  |
  |  Users             |  Engineers, data sci.  |  Analysts, business    |
  |  Azure example     |  ADLS + Serverless SQL |  Synapse Dedicated     |
  +--------------------+------------------------+------------------------+

๐Ÿ’ก HitaVir Tech says: "Warehouses are optimized for the answers you know you want. Lakes are optimized for the answers you haven't invented questions for yet. Real companies need both. The next section shows how to stop choosing and combine them."

๐Ÿ›๏ธ Warehouse in one line: columnar + MPP + star schema = fast answers for business users.

  +==============================================================+
  |       ARCHITECTURE  3   -   MODERN  DATA  ARCHITECTURE       |
  |         (aka the "Lakehouse" pattern on Azure)               |
  +==============================================================+

ADLS Gen2 Synapse Databricks Power BI ML Data Factory

The Problem Modern Data Architecture Solves

By 2018, most companies had both a lake and a warehouse โ€” and suffered:

Pain

Icon

Symptom

Two copies of the truth

๐Ÿ‘ฏ

Lake says one number, warehouse says another

Pipeline sprawl

๐Ÿ•ธ๏ธ

200 Data Factory jobs shuffling data between them

Permission chaos

๐Ÿ”

Entra ID for SQL, ACLs for storage, separate audits

Skill silos

๐Ÿง‘โ€๐Ÿ’ป

Data engineers in Spark, analysts in SQL, no common tool

ML engineers stuck

๐Ÿค–

Data scientists denied warehouse access, scraping lakes by hand

The Modern Data Architecture Idea

  +==============================================================+
  |                                                              |
  |    ONE GOVERNED PLATFORM                                     |
  |                                                              |
  |    - Unified storage  (ADLS Gen2 = source of truth)          |
  |    - Open table format  (Delta Lake / Iceberg)               |
  |    - Purpose-built engines  (pick the right tool per job)    |
  |    - Shared catalog + governance  (Purview)                  |
  |    - Common security model  (Entra ID + RBAC + Key Vault)    |
  |                                                              |
  +==============================================================+

Instead of lake or warehouse, you get lake and warehouse โ€” unified by one catalog, one permission model, one lineage.

Microsoft Fabric โ€” The Unified SaaS Lakehouse

Synapse Data Factory Data Explorer Power BI ML ADLS Gen2

In 2023 Microsoft launched Fabric โ€” a single SaaS surface that bundles:

Fabric Pillar

Icon

Under the Hood

OneLake

OneLake

One tenant-wide ADLS, "OneDrive for data"

Data Factory

Data Factory

Ingest and orchestrate (same engine as ADF)

Synapse Data Engineering

Synapse DE

Spark notebooks on Delta Lake

Synapse Data Warehouse

Synapse DW

T-SQL warehouse over Delta (not Dedicated Pool!)

Synapse Real-Time Analytics

Synapse RTA

KQL / Data Explorer on streams

Power BI

Power BI

Native BI over OneLake (Direct Lake mode)

Data Activator

Data Activator

Trigger actions from data signals

Copilot

Fabric Copilot

Natural-language across all pillars

Fabric is where Azure analytics is heading. The Part 1 services still exist โ€” Fabric simply bundles them on one pricing model, one identity, one lakehouse.

The Five Pillars of Modern Data Architecture on Azure

     +--------+     +--------+     +--------+     +--------+     +--------+
     |   1    |     |   2    |     |   3    |     |   4    |     |   5    |
     | SCALABLE|    |PURPOSE-|    |SEAMLESS|    |UNIFIED |    |FUTURE- |
     |   DATA  |    |  BUILT |    |  DATA  |    |GOVERN- |    |  PROOF |
     |   LAKE  |    | ENGINES|    |MOVEMENT|    |  ANCE  |    |   ML   |
     +---------+    +--------+    +--------+    +--------+    +--------+
         |              |              |              |              |
         v              v              v              v              v
       ADLS +        Synapse,      Shortcuts,   Purview +     Azure ML,
       Fabric        Databricks,   Mirroring,   Entra ID +    OpenAI,
       OneLake       Power BI,     ADF zero-    Key Vault +   Synapse
                     Data Explorer  copy link   Defender      ML, Fabric

Pillar 1 โ€” A Scalable Data Lake at the Core

ADLS Gen2

Every modern Azure architecture starts from ADLS Gen2 / OneLake. Why?

Reason

Icon

Impact

Infinite scale

โ™พ๏ธ

Never outgrow it

11 nines durability (GRS)

๐Ÿ›ก๏ธ

Your data is safer than on any disk

Pennies per GB

๐Ÿ’ฐ

Keep history forever

Native reader for Synapse, Databricks, Power BI, ML

๐ŸŒ

One source, many consumers

Pillar 2 โ€” Purpose-Built Engines

One-size-fits-all is dead. Pick the right engine per workload:

Workload

Icon

Engine

Why

Ad-hoc SQL on lake files

๐Ÿ”

Synapse Serverless SQL

Pay per TB scanned, zero setup

Dashboards on curated tables

๐Ÿ›๏ธ

Synapse Dedicated / Fabric Warehouse

Sub-second BI

Petabyte Spark / ML

๐Ÿ”ฅ

Databricks / Synapse Spark

Custom transforms + ML at scale

Sub-ms lookups

โšก

Cosmos DB

Document / key-value queries

Full-text + vector search

๐Ÿ”Ž

AI Search

RAG, log search, enterprise search

Real-time aggregation

๐ŸŽฏ

Stream Analytics / ADX

Streaming SQL / KQL

Pillar 3 โ€” Seamless Data Movement

Instead of 200 brittle ETL jobs, modern Azure architectures rely on:

Pillar 4 โ€” Unified Governance

Policy Key Vault Defender Activity Log

Layer

Icon

Service

Purpose

Identity

๐Ÿ”‘

Microsoft Entra ID (formerly AAD)

Who you are

Fine-grained access

๐Ÿ”

Azure RBAC + ADLS ACLs + Purview policies

What you can do

Encryption

๐Ÿ”’

Key Vault + Customer-Managed Keys

At-rest and in-transit

PII scanning

๐Ÿ•ต๏ธ

Microsoft Purview + Defender for Cloud

Find sensitive data

Audit

๐Ÿ“œ

Activity Log + Azure Monitor

Every API call, every query

Data discovery

๐Ÿงญ

Microsoft Purview

Business-friendly data catalog

Policy-as-code

โš™๏ธ

Azure Policy

Enforce rules on resources

Pillar 5 โ€” Built-in AI / ML

ML OpenAI

ML is no longer a bolt-on โ€” it lives inside the platform:

Capability

Icon

Service

Full ML lifecycle

๐Ÿค–

Azure Machine Learning

Foundation models (GPT-4, Claude partners, Llama)

๐Ÿง 

Azure OpenAI Service

Spark-native ML in notebooks

๐Ÿ”ฅ

Databricks MLflow / Synapse ML

No-code ML

๐Ÿ“ˆ

Azure ML Designer / AutoML

Natural-language BI

๐Ÿ’ฌ

Power BI Copilot

RAG

๐Ÿ”Ž

AI Search + Azure OpenAI

The Modern Data Architecture โ€” One Picture

  +==================================================================+
  |                                                                  |
  |   +---------+   +---------+   +---------+   +---------+          |
  |   | Batch   |   | Stream  |   | OpTx DB |   | SaaS    |          |
  |   | files   |   | events  |   | (CDC)   |   | apps    |          |
  |   +----+----+   +----+----+   +----+----+   +----+----+          |
  |        |             |             |             |               |
  |        +-------------+-------------+-------------+               |
  |                             |                                    |
  |                             v                                    |
  |     +-----------------------------------------------------+      |
  |     |    ADLS GEN2 / ONELAKE  ---  Centralized Lake       |      |
  |     |    Raw  ->  Curated  ->  Analytics  (Delta/Parquet) |      |
  |     +---------------------------+-------------------------+      |
  |                                 |                                |
  |                  governed by    |                                |
  |                                 v                                |
  |     +-----------------------------------------------------+      |
  |     | PURVIEW + ENTRA ID + KEY VAULT + DEFENDER + POLICY  |      |
  |     +-----------------------------------------------------+      |
  |                                 |                                |
  |       +------------+------------+------------+-------------+     |
  |       |            |            |            |             |    |
  |       v            v            v            v             v    |
  |   +-------+   +---------+   +-----+   +-----------+   +------+  |
  |   |Synapse|   |Synapse  |   |Data-|   |Azure Data |   |Azure |  |
  |   |Server-|   |Dedicated|   |bricks|  |Explorer   |   |ML /  |  |
  |   |less   |   |(MPP)    |   |Spark |  |  (KQL)    |   |OpenAI|  |
  |   +---+---+   +----+----+   +--+--+   +-----+-----+   +--+---+  |
  |       |            |           |             |           |      |
  |       +------------+-----------+-------------+-----------+      |
  |                                 |                                |
  |                                 v                                |
  |            +----------------------------------------+            |
  |            |   VALUE  =  Power BI + Copilot + apps  |            |
  |            +----------------------------------------+            |
  |                                                                  |
  +==================================================================+

Look carefully: every service from Part 1 has a home. That is modern Azure data architecture.

Modern Data Architecture โ€” In One Sentence

Centralize storage in a governed ADLS / OneLake. Use the best engine for each workload. Let data move frictionlessly between them. Secure it all uniformly. Build ML natively on top.

๐Ÿ’ก HitaVir Tech says: "Don't build one monolith. Don't build 20 silos. Build one lake, with many engines, one catalog, one security model. That's how Azure's biggest analytics customers run."

๐Ÿก Lakehouse in one line: one lake for storage, many engines for compute, one catalog for trust.

  +==============================================================+
  |         THE  COMPLETE  AZURE  SERVICE  MAP                   |
  |            for Modern Data Architecture                      |
  +==============================================================+

The Headline Cast for Part 2

ADLS Gen2 Synapse Databricks Data Factory Event Hubs Stream Analytics Data Explorer Power BI ML OpenAI AI Search Policy

Each service answers a specific question in the modern architecture:

Layer

Question

Service

Icon

Storage

Where does my data live?

ADLS Gen2 / OneLake

๐Ÿชฃ

Governance

Who can see what?

Microsoft Purview

๐Ÿ—๏ธ

Catalog

What data do we have?

Purview + Synapse Catalog

๐Ÿ“š

ETL / ELT

How do I shape it?

Azure Data Factory, Databricks, Synapse Spark

๐Ÿ•ธ๏ธ

SQL on lake

How do I explore?

Synapse Serverless SQL

๐Ÿ”

SQL on warehouse

How do I serve BI?

Synapse Dedicated / Fabric Warehouse

๐Ÿ›๏ธ

Stream ingest

How do I handle real time?

Event Hubs + Stream Analytics

๐ŸŒŠ

Time-series / logs

How do I query logs?

Azure Data Explorer (KQL)

๐Ÿ”Ž

BI

How do people see it?

Power BI

๐Ÿ“Š

ML

How do we predict?

Azure Machine Learning

๐Ÿค–

LLMs

How do we add GenAI?

Azure OpenAI + AI Search

๐Ÿง 

Real-time search

How do we find a needle?

Azure AI Search

๐Ÿ”Ž

Service Spotlight โ€” Azure Databricks

Databricks

  +--------------------------------------------------------------+
  |  AZURE  DATABRICKS  -  The Lakehouse Pioneer                 |
  +--------------------------------------------------------------+
  |  Audience  :  Data engineers + data scientists + analysts    |
  |  Engine    :  Apache Spark (Photon) + Delta Lake             |
  |  Notebooks :  Python, SQL, R, Scala                          |
  |  ML        :  MLflow tracking, feature store, model serving  |
  |  Governance:  Unity Catalog (like Purview, for Databricks)   |
  |                                                              |
  |  The "full lakehouse in one product" option.                 |
  +--------------------------------------------------------------+

Zero-Copy โ€” The Quiet Revolution on Azure

Classic ETL means writing code to extract-transform-load between systems. Azure's zero-copy features make data appear in another system without moving it:

  +-----------------+          +------------------------+
  | Azure SQL DB    |==Mirror=>| Microsoft Fabric       |
  |  (app DB)       |  managed | (Lakehouse, OneLake)   |
  +-----------------+          +------------------------+

  Zero-copy on Azure today:
  - Azure SQL        -> Fabric (Mirroring)
  - Cosmos DB        -> Fabric (Mirroring)
  - Snowflake        -> Fabric (Mirroring)
  - OneLake Shortcut -> ADLS Gen2 / S3 / GCS (no copy at all)

Fewer pipelines to maintain. Fresher analytics. Less on-call pain.

Azure Data Factory โ€” The Connective Tissue

Data Factory

In a modern architecture, ADF is everywhere:

Capability

Icon

Role

Copy Activity

๐Ÿ“‹

Move data between 100+ sources and sinks

Mapping Data Flows

๐Ÿ•ธ๏ธ

Visual Spark-based transforms

Triggers

โฑ๏ธ

Schedule, event, tumbling window

Integration Runtimes

๐ŸŒ

Cloud or self-hosted, on-prem โ†’ Azure

Git integration

๐Ÿ”€

Pipelines as code (Azure DevOps / GitHub)

Monitoring

๐Ÿ“ˆ

Built-in dashboards + Azure Monitor

A Mental Shortcut โ€” The Service Lookup Table

When someone describes a problem, scan this table first:

Problem Sounds Like...

Reach For

"We have terabytes piling up and need cheap storage"

๐Ÿชฃ ADLS Gen2 + ๐ŸงŠ Archive tier

"Analysts want SQL on 1B rows, must return in seconds"

๐Ÿ›๏ธ Synapse Dedicated / Fabric Warehouse

"We dump random files hourly, want ad-hoc SQL"

๐Ÿ” Synapse Serverless + ๐Ÿ•ธ๏ธ ADF

"Events come at 1M / sec and drive a live dashboard"

๐ŸŒŠ Event Hubs + ๐ŸŽฏ Stream Analytics

"Logs need to be searchable with keyword filters"

๐Ÿ”ฌ Azure Data Explorer (KQL)

"We keep 2 copies of the same data in lake and warehouse"

๐Ÿ”ญ Serverless / Mirroring / Fabric Direct Lake

"We need to share a slice with another Azure tenant"

๐Ÿ”— Azure Data Share / Purview data policies

"Non-engineers can't find any data"

๐Ÿงญ Microsoft Purview / Fabric catalog

"We want the CEO to ask questions in English"

๐Ÿ’ฌ Power BI Copilot

"Our support team wants to chat with our docs"

๐Ÿ”Ž AI Search + ๐Ÿง  Azure OpenAI (RAG)

๐Ÿ’ก HitaVir Tech says: "When a junior asks โ€˜which Azure service should we use?', the senior reply is another question โ€” โ€˜what is the actual pattern?' Service choice without pattern = tech for tech's sake."

  +==============================================================+
  |         SECTION  2   -   COMMON  USE  CASES                  |
  |           (where the patterns show up in real life)          |
  +==============================================================+

Most real-world analytics work on Azure falls into six repeatable patterns. Recognize them and you'll know which reference architecture to reach for.

The Service Cast Across All Six Patterns

Synapse Power BI Event Hubs Stream Analytics Data Explorer ADLS Gen2 Data Factory ML Databricks

The Six Patterns

  +-----------------------------------------------------------+
  |  1.  BATCH  BI           - nightly dashboards             |
  |  2.  REAL-TIME  ANALYTICS- live metrics, fraud, IoT       |
  |  3.  LOG  /  APP  OBS.   - search + troubleshoot logs     |
  |  4.  CUSTOMER  360       - unify profiles across sources  |
  |  5.  ML  /  PREDICTIVE   - forecast, recommend, score     |
  |  6.  DATA  MESH          - domain-owned, shared data      |
  +-----------------------------------------------------------+

Use Case 1 โ€” Batch Business Intelligence

Synapse Power BI

Who needs it: Every company with a CFO.

Shape: Operational databases + flat files โ†’ data lake โ†’ warehouse โ†’ Power BI.

Trait

Value

Freshness

Daily or hourly

Volume

GB to TB

Velocity V

๐Ÿข Batch

Core service

๐Ÿ›๏ธ Synapse Dedicated / Fabric Warehouse

Example prompt: "Revenue by region compared to last quarter, refreshed every morning at 8 am."

Use Case 2 โ€” Real-Time Analytics

Event Hubs Stream Analytics Functions

Who needs it: Rideshare, fintech, ad-tech, IoT, online gaming.

Shape: Events โ†’ Event Hubs โ†’ stream processor โ†’ live dashboard and ADLS for history.

Trait

Value

Freshness

Sub-second to seconds

Volume

Millions of events / sec

Velocity V

โšก Streaming

Core service

๐ŸŒŠ Event Hubs + ๐ŸŽฏ Stream Analytics

Example prompt: "Alert the risk team the moment any card transaction looks fraudulent."

Use Case 3 โ€” Log & Application Observability

Data Explorer Log Analytics Monitor

Who needs it: Every engineering team at scale.

Shape: Application logs โ†’ Log Analytics / ADX โ†’ ADLS archive.

Trait

Value

Freshness

Seconds

Volume

TB/day in logs

Velocity V

โšก Streaming

Core service

๐Ÿ”ฌ Azure Data Explorer (KQL)

Example prompt: "Search the last 30 days of production logs for any mention of this request ID."

Use Case 4 โ€” Customer 360

ADLS Gen2 Data Factory Synapse ML

Who needs it: Retail, banking, telecom, SaaS.

Shape: Unify profiles from CRM, web, mobile, support into one view, served to marketing + ML.

Trait

Value

Freshness

Hourly

Volume

TB

Dominant V

๐Ÿงฉ Variety

Core service

๐Ÿชฃ ADLS + ๐Ÿ•ธ๏ธ ADF + ๐Ÿ›๏ธ Synapse

Example prompt: "Show one customer's full lifetime journey โ€” ads seen, orders placed, tickets filed."

Use Case 5 โ€” ML / Predictive Analytics

ML Databricks ADLS Gen2

Who needs it: Forecasting, recommendations, fraud, churn, dynamic pricing.

Shape: Lake โ†’ feature store โ†’ model training โ†’ model endpoint โ†’ prediction served in app or BI.

Trait

Value

Freshness

Training weekly, inference real-time

Volume

GB to PB of history

Dominant V

๐Ÿ’Ž Value

Core service

๐Ÿค– Azure ML + ๐Ÿชฃ ADLS

Example prompt: "Predict which customers will churn next month so we can retain them."

Use Case 6 โ€” Data Mesh

Data Shares Policy ADLS Gen2

Who needs it: Enterprises with many product teams owning their own data.

Shape: Each domain team curates its own data products on ADLS / OneLake; a central catalog (Purview + Fabric) makes them discoverable and access-controlled.

Trait

Value

Freshness

Per-domain

Ownership

Distributed to domain teams

Dominant V

๐Ÿ›ก๏ธ Veracity + ๐Ÿ’Ž Value

Core service

๐Ÿ—๏ธ Purview + ๐Ÿก Fabric domains

Example prompt: "The Finance team owns โ€˜invoices', Marketing owns โ€˜campaigns', but anyone at the company can discover and request access."

Picking the Right Pattern โ€” Decision Cheat

                What is the dominant question?
                            |
       +-------+---+-----+----+-----+----+-------+
       |       |         |         |    |       |
    Weekly   Instant  Find a    One     Predict  Federated
    KPIs     alerts   log line  360 view future   ownership
       |       |         |         |    |       |
       v       v         v         v    v       v
     BATCH   REAL-    LOG      CUSTOMER  ML /   DATA
      BI     TIME     OBS.      360      PRED.   MESH

๐Ÿ’ก HitaVir Tech says: "New engineers try to force every problem into the pattern they already know. Seniors look at the dominant V and pick the shape โ€” then fill in services. Diagnose first. Build second."

  +==============================================================+
  |           SECTION  3  -  REFERENCE  ARCHITECTURES            |
  +==============================================================+

For each use case, here is a whiteboard-ready Azure architecture you can copy, adapt, and defend in a design review.

Reference 1 โ€” Batch BI on a Lakehouse

ADLS Gen2 Data Factory Synapse Power BI

  Azure SQL / Postgres            Dynamics 365, Salesforce
           |                                |
           +---- ADF with CDC connector ----+
                                   |
                                   v
            +--------------------------------------+
            |     ADLS Gen2  Data Lake             |
            |     raw -> curated (Delta/Parquet)   |
            +---------------+----------------------+
                            |
                            v  (ADF Data Flows, Purview DQ, catalog)
                            |
            +---------------+----------------------+
            |   Synapse Dedicated SQL Pool         |
            |   - Star schemas                     |
            |   - Nightly COPY loads               |
            +---------------+----------------------+
                            |
                            v
                     Power BI
               (Direct Query or Import)
                            |
                            v
                        Executives

When to pick it: Finance, ops, exec reporting. Stable queries, predictable loads.

Reference 2 โ€” Real-Time Analytics

Event Hubs Stream Analytics Functions Data Explorer ADLS Gen2 Power BI

  Producers (apps, IoT, clickstream)
             |
             v
   +-------------------------+
   |  Azure Event Hubs       |
   |  (durable, partitioned) |
   +------------+------------+
                |
    +-----------+------------+------------+
    |                        |            |
    v                        v            v
 Stream Analytics          Functions    Capture
 (continuous SQL)        (alerting on    |
    |                    anomalies)      v
    |                        |         ADLS
    v                        v       (Parquet, hist.)
 Live dashboard            Service         |
 (Power BI streaming)      Bus / Teams     v
 OR Data Explorer          alert        Synapse
                                        Serverless
                                         ad-hoc

When to pick it: Fraud detection, live pricing, real-time personalization, IoT.

Reference 3 โ€” Log & Application Observability

Monitor Log Analytics Data Explorer ADLS Gen2

  App services / AKS / VMs / Activity Log / Defender
                      |
                      v
              Azure Monitor
          (agent + diagnostic settings)
                      |
           +----------+-----------+
           |                      |
           v                      v
   Log Analytics /          ADLS (archive)
   Data Explorer            (cheap, years)
   (hot, 30-90 days)              |
           |                      v
           v                 Synapse Serverless
   KQL dashboards /         for historical
   Grafana / Sentinel       audits

When to pick it: SRE and platform teams, security logs (Sentinel), microservice observability.

Reference 4 โ€” Customer 360

ADLS Gen2 Data Factory Synapse ML

  Dynamics 365   Web events    Mobile app    Zendesk / SN
       |            |              |                |
       +------------+------+-------+----------------+
                          |
                          v
                ADLS Gen2 Data Lake (raw)
                          |
                          v  ADF + Purview DQ
                          |
                ADLS Gen2 Data Lake (curated, Delta)
                          |
                          v
         +----------------+------------------+
         |                                    |
         v                                    v
     Synapse                              Azure ML
     Unified Customer                     Features + Models
     table (serving BI)                   (churn, LTV, NBA)
         |                                    |
         v                                    v
      Power BI 360                   Marketing automation
      dashboard                      (personalized offers)

When to pick it: Retail, banking, telecom, subscription SaaS.

Reference 5 โ€” ML / Predictive Analytics

ADLS Gen2 Databricks ML Functions

  +----------------------+
  |   ADLS / OneLake     |
  |   (historical data)  |
  +----------+-----------+
             |
             v
  +----------------------+
  |  Databricks / Synapse|
  |  Feature engineering |
  +----------+-----------+
             |
             v
  +-------------------------+
  |  Azure ML Feature Store |
  +-----+-----------+-------+
        |           |
  (training)    (serving)
        v           v
  Azure ML       Real-time
  Jobs           endpoint
  (compute       (managed online
   cluster)       inference)
        |           |
        v           v
     Models    Mobile / web app
                (personalized UX)

When to pick it: Recommenders, fraud detection, demand forecasting, dynamic pricing.

Reference 6 โ€” Data Mesh on Azure

Policy ADLS Gen2 Synapse Data Shares

  Domain A (Orders)      Domain B (Marketing)     Domain C (Finance)
  owns its own pipes     owns its own pipes       owns its own pipes
     |                       |                         |
     v                       v                         v
  ADLS + Synapse          ADLS + Synapse           ADLS + Synapse
  (Orders domain)         (Marketing domain)       (Finance domain)
     |                       |                         |
     +-----------+-----------+-------------------------+
                 |
                 v
       +-----------------------------------+
       |  Microsoft Purview  (central cat) |
       |  + Fabric domains + Data Policies |
       +---------------+-------------------+
                       |
          +------------+------------+
          |            |            |
          v            v            v
        Analyst    Data sci.    Executive
       (Synapse)  (Azure ML)   (Power BI / Copilot)

When to pick it: Large enterprises where domain teams must own their data products, but a central platform team guarantees governance.

Reference Architecture Comparison

#

Pattern

Primary V

Storage

Compute

Serve

1

Batch BI

Volume

ADLS + Synapse

ADF, Synapse Dedicated

Power BI

2

Real-Time

Velocity

Event Hubs + ADLS

Stream Analytics, Functions

Power BI, Data Explorer

3

Log Obs.

Velocity + Variety

Log Analytics + ADX + ADLS

Azure Monitor

KQL dashboards, Sentinel

4

360

Variety + Value

ADLS + Synapse

ADF

Power BI + apps

5

ML

Value

ADLS

Databricks, Azure ML

Endpoint in app

6

Mesh

Veracity + Value

Distributed ADLS

Per-domain

Purview + Power BI

๐Ÿ’ก HitaVir Tech says: "Architects don't memorize 50 services โ€” they memorize 6 shapes. When someone brings a new problem, they map it to a shape first, then pick services to fit. Copy these six patterns. Most of your career, you'll be adapting one of them."

  +==============================================================+
  |             QUIZ  -   TEST  YOUR  UNDERSTANDING              |
  +==============================================================+

The Services Under Test

ADLS Gen2 Synapse Databricks Event Hubs Stream Analytics Data Shares Power BI

Answer each question before revealing. No peeking โ€” this is how you build real recall.

Question 1 โ€” Data Lake Fundamentals

Which of the following best describes a data lake?

Answer: B. A lake holds any data in native format; multiple engines (Synapse, Databricks, Power BI, Azure ML) read from it.

Question 2 โ€” Data Warehouse Property

Which property is specific to data warehouses, not data lakes?

Answer: C. Columnar + MPP is the warehouse signature, enabling fast aggregation on billions of rows.

Question 3 โ€” Medallion Zones

Match each zone to its typical reader:

Zone

Readers

Raw (Bronze)

?

Curated (Silver)

?

Analytics (Gold)

?

Answer: Raw = data engineers only. Curated = analysts and ML engineers. Analytics = business users and BI tools.

Question 4 โ€” Lakehouse Core Service

In an Azure Modern Data Architecture, which service is the "source of truth" storage layer?

Answer: B. ADLS Gen2 (or OneLake in Fabric). Every other analytics engine on Azure reads from it.

Question 5 โ€” Synapse Serverless

What does Synapse Serverless SQL enable?

Answer: C. Serverless SQL lets you query lake data from the warehouse โ€” no duplicate storage.

Question 6 โ€” Governance

Which service provides unified data catalog, lineage, and policy across ADLS, Synapse, Power BI, and even AWS S3?

Answer: B. Microsoft Purview. Azure RBAC is coarse identity; Defender finds PII; Activity Log audits; Purview governs across the whole estate.

Question 7 โ€” Pattern Matching

A fraud team needs to block bad transactions within 200 ms. Which pattern fits?

Answer: B. Real-time analytics with streaming + serverless alerting.

Question 8 โ€” Mirroring

Microsoft Fabric Mirroring between Azure SQL and Fabric eliminates the need to:

Answer: B. Mirroring replicates changes automatically โ€” no pipeline code to maintain.

Question 9 โ€” Business Data Discovery

Which service helps non-engineers discover datasets using business terms instead of table names?

Answer: C. Purview provides a business-friendly catalog, lineage, and classification.

Question 10 โ€” Anti-Pattern

A company stores 5 years of clickstream JSON in ADLS but has no Purview, no Azure Policy, no quality rules. What is this?

Answer: B. A data swamp โ€” no catalog, no governance, no trust. Storage alone is not an architecture.

Score Yourself

Score

Meaning

9-10

๐ŸŽ“ You are ready for production design reviews

7-8

๐Ÿง  Solid. Re-read the reference architectures section

5-6

๐Ÿ“š Review the three architecture chapters and retake

< 5

๐Ÿ”„ Re-do Part 1 first โ€” the 5 Vs are the foundation

๐Ÿ’ก HitaVir Tech says: "Don't guess. The questions here are the same ones that show up in every Azure analytics interview. Know them cold."

  +==============================================================+
  |          CONGRATULATIONS  -  PART 2 DONE!                    |
  +==============================================================+

Every Service You Now Know

ADLS Gen2 Synapse Databricks Data Factory Event Hubs Stream Analytics Data Explorer Power BI ML OpenAI AI Search Cosmos DB

What You Learned

๐Ÿชฃ Data Lakes

Topic

Icon

What a lake is (schema-on-read)

๐Ÿ“–

The medallion zones (raw, curated, analytics)

๐Ÿฅ‰๐Ÿฅˆ๐Ÿฅ‡

Why lakes become swamps without governance

๐ŸŠ

Delta Lake โ€” ACID on ADLS

๐Ÿ”บ

๐Ÿ›๏ธ Data Warehouses

Topic

Icon

Schema-on-write, columnar, MPP

๐Ÿ“Š

Star schemas (fact + dimensions)

โญ

Synapse Dedicated vs Serverless

๐Ÿ›๏ธ

Fabric Warehouse (the new kid)

๐Ÿก

๐Ÿก Modern Data Architecture (Lakehouse)

Pillar

Icon

Scalable data lake (ADLS / OneLake)

๐Ÿชฃ

Purpose-built engines

๐Ÿงฐ

Seamless movement (Mirroring, Shortcuts, Direct Lake)

๐Ÿ”€

Unified governance (Purview + Entra)

๐Ÿ”

Built-in AI / ML (Azure ML + OpenAI)

๐Ÿค–

๐Ÿ—บ๏ธ Six Reference Architectures

#

Pattern

Core Service

1

Batch BI

๐Ÿ›๏ธ Synapse + ๐Ÿ“Š Power BI

2

Real-Time

๐ŸŒŠ Event Hubs + ๐ŸŽฏ Stream Analytics

3

Log Observability

๐Ÿ”ฌ Data Explorer / Log Analytics

4

Customer 360

๐Ÿชฃ ADLS + ๐Ÿ•ธ๏ธ ADF + ๐Ÿ›๏ธ Synapse

5

ML / Predictive

๐Ÿค– Azure ML + ๐Ÿ”ฅ Databricks

6

Data Mesh

๐Ÿ—๏ธ Purview + ๐Ÿก Fabric domains

The Mental Upgrade From Part 1 to Part 2

Part 1 Gave You...

Part 2 Gave You...

๐Ÿ“ A diagnostic framework (5 Vs)

๐Ÿ—บ๏ธ Reference architectures

โ˜๏ธ One service per V

๐Ÿงฉ Combinations that scale

๐Ÿ› ๏ธ A basic ADLS โ†’ Synapse Serverless lab

๐Ÿก The full Lakehouse picture

๐ŸŽฏ What to pick for each bottleneck

๐ŸŽฏ Which shape fits each real-world problem

What To Do Next

  1. ๐Ÿ”Ž Draw one of the six reference architectures for a project you know.
  2. ๐Ÿ’ฌ Walk a teammate through it โ€” teaching cements understanding.
  3. ๐Ÿงช Extend the Part 1 lab: add an ADF pipeline to move raw/ โ†’ curated/ as Parquet.
  4. ๐Ÿ“š Read the Azure Well-Architected Framework โ€” Analytics perspective (link in the Appendix).
  5. ๐ŸŽ“ Try Microsoft Learn for free, hands-on Azure training paths.

Final Thoughts

  +==============================================================+
  |                                                              |
  |   Part 1  =  the 5 Vs  +  one service per V                  |
  |   Part 2  =  three architectures  +  six reference shapes    |
  |                                                              |
  |   Together  =  you can design and defend                     |
  |               an analytics platform on Azure.                |
  |                                                              |
  +==============================================================+

๐Ÿ’ก HitaVir Tech says: "You just completed the same journey a new hire at a top cloud team makes in their first three months. Keep the cheat sheet, apply the patterns, and your architecture reviews will sound like a 5-year veteran's. Fundamentals compound."

๐ŸŽ“ Welcome to the Lakehouse. See you in Part 3 โ€” hands-on ADF + Synapse + Power BI capstone.

โ€” HitaVir Tech โ˜๏ธ

  +==============================================================+
  |               RESOURCES  AND  NEXT  STEPS                    |
  +==============================================================+

Services Referenced in This Appendix

ADLS Gen2 Data Factory Synapse Databricks Event Hubs Stream Analytics Data Explorer Power BI ML Data Shares

Official Microsoft Documentation

Topic

Icon

Where to Read

Azure Data Lake Storage Gen2

๐Ÿชฃ

learn.microsoft.com/azure/storage/blobs/data-lake-storage-introduction

Microsoft Purview

๐Ÿ—๏ธ

learn.microsoft.com/purview

Azure Data Factory

๐Ÿ•ธ๏ธ

learn.microsoft.com/azure/data-factory

Azure Synapse Analytics

๐Ÿ›๏ธ

learn.microsoft.com/azure/synapse-analytics

Azure Databricks

๐Ÿ”ฅ

learn.microsoft.com/azure/databricks

Azure Event Hubs

๐ŸŒŠ

learn.microsoft.com/azure/event-hubs

Azure Stream Analytics

๐ŸŽฏ

learn.microsoft.com/azure/stream-analytics

Azure Data Explorer

๐Ÿ”ฌ

learn.microsoft.com/azure/data-explorer

Power BI

๐Ÿ“Š

learn.microsoft.com/power-bi

Azure Machine Learning

๐Ÿค–

learn.microsoft.com/azure/machine-learning

Microsoft Fabric

๐Ÿก

learn.microsoft.com/fabric

Microsoft Whitepapers (Free, Highly Recommended)

Whitepaper

Icon

Why Read It

Azure Well-Architected Framework

๐Ÿ“

The canonical design checklist

Cloud Adoption Framework โ€” Data

๐Ÿก

Governance, operating models

Azure Data Architecture Guide

๐Ÿ—บ๏ธ

Reference diagrams, blessed by MS

Modern Data Warehouse Architecture

๐Ÿ›๏ธ

The classic lakehouse blueprint

Microsoft Fabric Whitepaper

๐Ÿก

Why Fabric, and how it maps to Azure

Free Training

Resource

Icon

Provider

Microsoft Learn โ€” Azure Data Engineer path

๐ŸŽ“

learn.microsoft.com

DP-203 Azure Data Engineer Associate cert

๐ŸŽฏ

Microsoft Certification

DP-500 / DP-600 Fabric certs

๐Ÿ†

Microsoft Certification

AI-102 Azure AI Engineer cert

๐Ÿง 

Microsoft Certification

Hands-on Labs to Try Next

Books

Book

Why

Designing Data-Intensive Applications โ€” Martin Kleppmann

The single best data engineering book ever written

The Data Warehouse Toolkit โ€” Ralph Kimball

Star schemas, dimensional modeling, classics

Fundamentals of Data Engineering โ€” Joe Reis

Modern, cloud-era data engineering

Azure Data Engineering Cookbook โ€” Ahmad Osama

Practical Azure recipes

Community

  +==============================================================+
  |          AZURE  ANALYTICS  -  PART 2  CHEAT  SHEET           |
  |                  (screenshot and keep)                       |
  +==============================================================+

The Lakehouse Toolbox โ€” At a Glance

ADLS Gen2 Synapse Databricks Data Factory Event Hubs Data Explorer Cosmos DB Data Shares Key Vault Defender Power BI ML OpenAI AI Search

๐Ÿชฃ Data Lake in 30 Seconds

  Store ANY data  --->  ADLS Gen2 (raw | curated | analytics)
  Schema-on-READ  --->  decided at query time
  Read by         --->  Synapse, Databricks, Power BI, Azure ML
  Governed by     --->  Microsoft Purview + Azure Policy

๐Ÿ›๏ธ Data Warehouse in 30 Seconds

  Columnar + MPP  --->  fast aggregation on billions of rows
  Schema-on-WRITE --->  decided before load
  Star schema     --->  fact + dimensions
  Azure service   --->  Synapse Dedicated / Fabric Warehouse

๐Ÿก Modern Data Architecture in 30 Seconds

  One lake  +  many engines  +  one catalog  +  one security model  +  native AI/ML

Pillar

Icon

Services

Scalable lake

๐Ÿชฃ

ADLS Gen2 / OneLake

Purpose-built engines

๐Ÿงฐ

Synapse, Databricks, Data Explorer, Cosmos DB

Seamless movement

๐Ÿ”€

Mirroring, Shortcuts, Direct Lake, ADF

Unified governance

๐Ÿ”

Purview, Entra ID, Key Vault, Defender, Policy

Built-in AI

๐Ÿค–

Azure ML, OpenAI, AI Search, Copilot

๐Ÿ—บ๏ธ The Six Reference Architectures

#

Pattern

When

Core

1

Batch BI

Daily dashboards

๐Ÿ›๏ธ Synapse + ๐Ÿ“Š Power BI

2

Real-Time

Fraud, IoT, live

๐ŸŒŠ Event Hubs + ๐ŸŽฏ Stream Analytics

3

Log Obs.

Search production logs

๐Ÿ”ฌ Data Explorer / Sentinel

4

360

Unify customer view

๐Ÿชฃ ADLS + ๐Ÿ•ธ๏ธ ADF + ๐Ÿ›๏ธ Synapse

5

ML

Predict & personalize

๐Ÿค– Azure ML + ๐Ÿ”ฅ Databricks

6

Data Mesh

Enterprise domain ownership

๐Ÿ—๏ธ Purview + ๐Ÿก Fabric

๐ŸŽฏ The Golden Rules

๐Ÿง  The One-Sentence Takeaways

๐Ÿ“ˆ Next Steps

  1. Redraw one of the six architectures for a project in your current job.
  2. Extend the Part 1 lab: add an ADF pipeline to convert CSV to Parquet.
  3. Try Fabric free trial with the sample lakehouse dataset.
  4. Read the Azure Well-Architected Framework end-to-end.
  5. Move to Part 3 โ€” Hands-On Lakehouse (ADF + Synapse + Power BI capstone).

Azure service icons used in this codelab are from the official Microsoft Azure Public Service Icons set (V23), freely distributed by Microsoft for use in architecture diagrams and educational materials.