AWS

  +============================================================+
  |                                                            |
  |      AWS ANALYTICS FUNDAMENTALS - PART 2                   |
  |                                                            |
  |  Data Lakes  -  Data Warehouses  -  Modern Architecture    |
  |                                                            |
  |                 Powered by HitaVir Tech                    |
  +============================================================+

Welcome to Fundamentals of Analytics on AWS - Part 2 by HitaVir Tech!

In Part 1 you built the mental model โ€” analytics concepts, the 5 Vs, and the AWS services that solve each V. In Part 2 you will zoom out and learn how those services combine into production architectures used by real companies today.

Where Part 1 Ends and Part 2 Begins

Part 1 โ€” The Ingredients

Part 2 โ€” The Recipe

๐Ÿง  What is analytics / ML?

๐Ÿ›๏ธ What is a data warehouse?

๐Ÿ“ The 5 Vs diagnostic

๐Ÿชฃ What is a data lake?

โ˜๏ธ One service per V

๐Ÿงฉ How they combine into a Lake House

๐Ÿ› ๏ธ S3 โ†’ Glue โ†’ Athena mini lab

๐Ÿ—บ๏ธ Reference architectures for 6 real use cases

What You Will Master

Pillar

Topics

๐Ÿชฃ Data Lakes

What, why, zones, governance

๐Ÿ›๏ธ Data Warehouses

Columnar MPP, star schemas, Redshift

๐Ÿก Modern Data Architecture

Lake House โ€” combining both worlds

โ˜๏ธ AWS Services

S3, Lake Formation, Redshift, Glue, Athena, Kinesis, QuickSight

๐Ÿ—บ๏ธ Reference Architectures

Batch BI, Streaming, ML, Log analytics, 360ยฐ customer, Data mesh

๐ŸŽฏ Common Use Cases

When to pick which pattern

Services You Will Meet in Part 2

S3 Glue Athena Redshift EMR Kinesis Kinesis Firehose OpenSearch Lake Formation QuickSight SageMaker Bedrock DataZone

Why Architecture Matters More Than Services

  +================================================================+
  |                                                                |
  |   Services are Lego bricks.   Architecture is the castle.      |
  |                                                                |
  |   Any junior can spin up S3 + Redshift.                        |
  |   Seniors know WHEN to use which, WHY, and HOW they join.      |
  |                                                                |
  +================================================================+

Estimated Duration

2-3 hours (concept-heavy, no new hands-on required โ€” uses Part 1 lab as the anchor)

How to Use This Codelab

If you are...

Do this

๐ŸŽ“ A student new to cloud

Read top-to-bottom; pause at each reference architecture

๐Ÿ› ๏ธ A working engineer

Skim sections 1-3, deep-read the reference architectures for patterns you ship

๐Ÿ—๏ธ A solution architect

Use the reference diagrams as whiteboard starters with stakeholders

๐Ÿ”– A reference reader

Jump to the Quiz, the Cheat Sheet, and the Appendix of Resources

๐Ÿ’ก HitaVir Tech says: "Services change names every few years โ€” Glue was DataPipeline, Kinesis was Kafka-on-AWS, SageMaker was ML-on-EC2. But the shapes of data architectures stay stable for decades. Master the shapes. You will pick up the services in a week."

What You Should Know

Required

Helpful

Mental Model You Already Have (From Part 1)

S3 Glue Athena Redshift Rekognition Kinesis Kinesis Firehose Lambda Macie QuickSight SageMaker Bedrock

  +--------------------------------------------------------------+
  |   5 Vs framework          AWS service toolkit                |
  |   -------------------     ----------------------             |
  |   Volume                  S3, Glacier, Redshift, EMR         |
  |   Variety                 Glue, Athena, Rekognition, ...     |
  |   Velocity                Kinesis, Firehose, Lambda, MSK     |
  |   Veracity                DataBrew, Glue DQ, Macie, ...      |
  |   Value                   QuickSight, SageMaker, Forecast    |
  +--------------------------------------------------------------+

In Part 2 we compose these services into proven shapes.

No Paid Resources Required

Part 2 is concept-heavy. Every diagram is annotated with the services you already met in Part 1. The Part 1 hands-on lab is the practical anchor โ€” this codelab teaches the architectures that scale it up.

โš ๏ธ If you choose to experiment with Redshift or Kinesis: they can exit the free tier quickly. Use serverless modes and destroy resources the same day.

  +==============================================================+
  |         SECTION  1  -  ARCHITECTURES (the big three)         |
  +==============================================================+

The Three Architecture Icons

S3 Redshift Lake Formation

Three architecture patterns power 95% of modern analytics in production:

                    +----------------------+
                    |  1.  DATA LAKE       |
                    |  Store anything,     |
                    |  cheap and forever   |
                    +----------+-----------+
                               |
                               | grew alongside
                               v
                    +----------------------+
                    |  2.  DATA WAREHOUSE  |
                    |  Fast SQL on         |
                    |  curated tables      |
                    +----------+-----------+
                               |
                               | combined into
                               v
                    +----------------------+
                    |  3.  LAKE HOUSE      |
                    |  (Modern Data Arch.) |
                    |  Best of both        |
                    +----------------------+

Part 2 tours each architecture, shows the AWS services that implement it, then demonstrates how real companies blend all three for different use cases.

  +==============================================================+
  |              ARCHITECTURE  1   -   DATA  LAKE                |
  |              "Store first, schema later."                    |
  +==============================================================+

S3 Lake Formation Glue Catalog

What is a Data Lake?

๐Ÿชฃ A data lake is a centralized repository that stores any type of data โ€” structured, semi-structured, unstructured โ€” at any scale, in its native format, typically on cheap object storage like Amazon S3.

The defining move: you ingest now and decide the schema later (called schema-on-read). Contrast with warehouses, which demand schema-on-write.

Data Lake โ€” The Core Idea

  +------------------------------------------------------------+
  |                                                            |
  |   ANY DATA  --->   S3 (object store)   --->  ANY ENGINE    |
  |                                                            |
  |   CSV, JSON,        cheap, durable,         Athena,        |
  |   Parquet, logs,    infinite scale,         Redshift,      |
  |   images, PDFs,     one source of truth     EMR,           |
  |   Kafka events                              SageMaker      |
  |                                                            |
  +------------------------------------------------------------+

One lake. Many engines. That is the core promise.

Why Data Lakes Emerged

Before ~2010, analytics meant warehouses โ€” expensive, schema-strict, row-limited. Then data exploded:

Problem With Warehouse-Only World

Who Felt It

๐Ÿ’ธ Warehouse storage cost $1000s / TB / month

Every CFO

๐Ÿšซ Could not store PDFs, images, videos

Healthcare, retail, media

๐Ÿข Schema changes took weeks

Fast-moving startups

โ›” Historical data deleted to save cost

Regulated industries

Data lakes fixed this by leveraging cheap object storage (S3 at ~$0.023 / GB / month) and decoupling compute from storage.

Data Lake Zones โ€” The Medallion Pattern

  s3://hitavirtech-lake/
    |
    +-- raw/          <--  BRONZE:  untouched, as ingested
    |                      + source of truth
    |                      + can replay anything from here
    |
    +-- curated/      <--  SILVER:  cleaned, typed, Parquet
    |                      + deduped, quality-checked
    |                      + partitioned for fast scans
    |
    +-- analytics/    <--  GOLD:    pre-aggregated, BI-ready
                           + joins done once
                           + powers dashboards and ML features

Zone

Icon

Shape

Readers

Raw / Bronze

๐Ÿฅ‰

Original bytes โ€” CSV, JSON, images, dumps

Data engineers only

Curated / Silver

๐Ÿฅˆ

Cleaned, typed, often Parquet + partitions

Analysts, ML engineers

Analytics / Gold

๐Ÿฅ‡

Aggregated, ready for dashboards and models

Business users, BI tools

Data Lake on AWS โ€” The Service Stack

S3 Lake Formation Glue Glue Catalog Athena

Layer

Icon

Purpose

Service

Storage

๐Ÿชฃ

Raw bytes, infinite scale

Amazon S3

Governance

๐Ÿ”

Permissions, row/column security

AWS Lake Formation

Cataloging

๐Ÿ“š

Schema + partitions metadata

AWS Glue Data Catalog

ETL

๐Ÿ•ธ๏ธ

Move raw โ†’ curated โ†’ analytics

AWS Glue ETL, EMR

Query

๐Ÿ”

SQL on lake files

Amazon Athena

ML

๐Ÿค–

Train on lake data directly

Amazon SageMaker

Service Spotlight โ€” AWS Lake Formation

Lake Formation

  +--------------------------------------------------------------+
  |  AWS  LAKE  FORMATION  -  Governance for S3 Data Lakes       |
  +--------------------------------------------------------------+
  |  Permissions :  Table / column / row-level grants            |
  |  Discovery   :  Blueprint ingestion from DBs, logs           |
  |  Auditing    :  Every access logged to CloudTrail            |
  |  Catalog     :  Wraps the Glue Data Catalog with RBAC        |
  |                                                              |
  |  Turns a raw S3 bucket into a governed, multi-tenant lake.   |
  +--------------------------------------------------------------+

Lake Formation is what lets one S3 bucket serve 20 teams without everyone seeing everyone else's columns.

Data Lake Strengths and Weaknesses

Strength

Icon

Weakness

Icon

Cheap per GB

๐Ÿ’ฐ

Can become a "data swamp" without governance

๐ŸŠ

Any format

๐Ÿงฉ

Query performance < a warehouse on the same data

๐Ÿข

Separates storage and compute

๐Ÿ”€

Schema enforcement is optional (and often skipped)

๐Ÿซฅ

Multi-engine access (Athena, Spark, Redshift, ML)

๐ŸŒ

Harder for business users to self-serve

๐Ÿ˜ต

The Data Swamp โ€” How Lakes Fail

  +--------------------------------------------------------------+
  |                                                              |
  |   NO CATALOG         ->  "Which bucket has customers?"       |
  |   NO QUALITY RULES   ->  "Why are 40% of amounts negative?"  |
  |   NO GOVERNANCE      ->  "Who deleted last quarter's data?"  |
  |   NO LIFECYCLE       ->  "We're paying for 2014 clickstream" |
  |                                                              |
  |             ==>  DATA SWAMP  (useless, expensive)            |
  +--------------------------------------------------------------+

Every successful data lake is paired with Glue Catalog + Lake Formation + quality rules + lifecycle policies. Skip these and your lake drowns.

๐Ÿ’ก HitaVir Tech says: "A data lake without a catalog is a data swamp. A data lake without quality rules is a liability. Governance is not optional โ€” it is the difference between an asset and a landfill."

๐Ÿชฃ Data lake in one line: store everything cheaply, govern it strictly, query it from any engine.

  +==============================================================+
  |           ARCHITECTURE  2   -   DATA  WAREHOUSE              |
  |           "Fast SQL on curated, trusted data."               |
  +==============================================================+

Redshift EMR

What is a Data Warehouse?

๐Ÿ›๏ธ A data warehouse is a centralized, highly-structured database optimized for analytical queries โ€” aggregations, joins, and scans across billions of rows โ€” at interactive speeds.

Key properties:

Property

Icon

What It Means

Schema-on-write

๐Ÿ“

Every row fits a predefined schema at load time

Columnar storage

๐Ÿ“Š

Stores columns together, not rows โ€” 10-100ร— faster scans

MPP (Massively Parallel Processing)

โšก

Splits work across many nodes automatically

Optimized for reads

๐Ÿ“–

Writes are slower; reads are lightning-fast

Business-user friendly

๐Ÿ‘ฅ

Clean star schemas; analysts can self-serve SQL

Row vs Columnar Storage

  ROW STORE (OLTP, e.g. MySQL)
  ----------------------------
  [id | name | country | amount]  <-- each row stored together

  Great for:  "Get everything about order 1042"
  Bad for:    "SUM(amount) across 1B rows"

  COLUMNAR STORE (OLAP, e.g. Redshift, Parquet)
  ---------------------------------------------
  [id][id][id]...
  [name][name][name]...
  [country][country][country]...
  [amount][amount][amount]...      <-- each column stored together

  Great for:  "SUM(amount) across 1B rows"  (scan only one column)
  Bad for:    "Get everything about order 1042"

Warehouses use columnar. That one design choice is why they can aggregate billions of rows in seconds.

Star Schema โ€” The Warehouse Language

Most warehouse tables follow the star schema:

                    +---------------------+
                    |   DIM_CUSTOMER      |
                    |   (who bought)      |
                    +----------+----------+
                               |
                               |
  +---------------+    +---------------+    +---------------+
  | DIM_PRODUCT   |----|  FACT_SALES   |----|  DIM_DATE     |
  | (what sold)   |    |  (the event)  |    | (when sold)   |
  +---------------+    +-------+-------+    +---------------+
                               |
                               |
                    +----------+----------+
                    |   DIM_STORE         |
                    |   (where sold)      |
                    +---------------------+

Star schemas make queries fast AND readable: SELECT country, SUM(amount) FROM fact_sales JOIN dim_store ....

Service Spotlight โ€” Amazon Redshift

Redshift

  +--------------------------------------------------------------+
  |  AMAZON  REDSHIFT  -  AWS's Managed Data Warehouse           |
  +--------------------------------------------------------------+
  |  Category     :  Columnar MPP warehouse                      |
  |  Modes        :  Serverless  |  Provisioned (RA3)            |
  |  Scale        :  GBs to petabytes                            |
  |  SQL dialect  :  PostgreSQL-flavored                         |
  |  Superpowers  :  Redshift Spectrum, Redshift ML, Data Share  |
  |                                                              |
  |  The engine behind Nasdaq, McDonald's, Yelp analytics.       |
  +--------------------------------------------------------------+

Redshift Spectrum โ€” Warehouse Queries Over the Lake

                 +----------------------------+
                 |  Redshift Cluster          |
                 |  (hot, curated tables)     |
                 +-------------+--------------+
                               |
                               | joins across
                               v
                 +----------------------------+
                 |  S3 Data Lake via Spectrum |
                 |  (cold, historical data)   |
                 +----------------------------+

One SQL query spans both the warehouse (recent, hot) and the lake (years of history). No duplicate storage, no duplicate pipelines.

Redshift Loading Patterns

Source

Icon

Loader

Speed

S3 files

๐Ÿชฃ

COPY command (parallel)

๐Ÿš€

Kinesis Streams

๐ŸŒŠ

Redshift streaming ingestion

๐Ÿš€

Operational DBs (MySQL, Postgres)

๐Ÿ—„๏ธ

AWS DMS + CDC

๐Ÿƒ

SaaS apps (Salesforce, etc.)

โ˜๏ธ

AWS AppFlow or partner tools

๐Ÿšถ

Data Warehouse Strengths and Weaknesses

Strength

Icon

Weakness

Icon

Sub-second SQL on billions of rows

โšก

Expensive per TB stored

๐Ÿ’ธ

Business-analyst friendly

๐Ÿ‘ฅ

Rigid schema โ€” changes need migrations

๐Ÿ”’

Mature BI tool ecosystem

๐Ÿ“Š

Only handles structured data

๐Ÿ“‹

ACID transactions and governance baked-in

๐Ÿ›ก๏ธ

Locked into one vendor's engine

๐Ÿ”—

Lake vs Warehouse โ€” The Canonical Comparison

  +--------------------+------------------------+------------------------+
  |  ATTRIBUTE         |  DATA  LAKE            |  DATA  WAREHOUSE       |
  +--------------------+------------------------+------------------------+
  |  Data type         |  Anything              |  Structured only       |
  |  Schema            |  On read               |  On write              |
  |  Cost / TB stored  |  $  (cheap)            |  $$$$  (expensive)     |
  |  Query speed       |  Medium                |  Fast                  |
  |  Users             |  Engineers, data sci.  |  Analysts, business    |
  |  AWS example       |  S3 + Glue + Athena    |  Redshift              |
  +--------------------+------------------------+------------------------+

๐Ÿ’ก HitaVir Tech says: "Warehouses are optimized for the answers you know you want. Lakes are optimized for the answers you haven't invented questions for yet. Real companies need both. The next section shows how to stop choosing and combine them."

๐Ÿ›๏ธ Warehouse in one line: columnar + MPP + star schema = fast answers for business users.

  +==============================================================+
  |       ARCHITECTURE  3   -   MODERN  DATA  ARCHITECTURE       |
  |          (aka the "Lake House" pattern)                      |
  +==============================================================+

S3 Lake Formation Redshift Athena Glue QuickSight SageMaker

The Problem Modern Data Architecture Solves

By 2018, most companies had both a lake and a warehouse โ€” and suffered:

Pain

Icon

Symptom

Two copies of the truth

๐Ÿ‘ฏ

Lake says one number, warehouse says another

Pipeline sprawl

๐Ÿ•ธ๏ธ

200 jobs shuffling data between them

Permission chaos

๐Ÿ”

IAM for S3, Redshift grants, separate audits

Skill silos

๐Ÿง‘โ€๐Ÿ’ป

Data engineers in Spark, analysts in SQL, no common tool

ML engineers stuck

๐Ÿค–

Data scientists denied warehouse access, scraping lakes by hand

The Modern Data Architecture Idea

  +==============================================================+
  |                                                              |
  |    ONE GOVERNED PLATFORM                                     |
  |                                                              |
  |    - Unified storage  (S3 data lake = source of truth)       |
  |    - Purpose-built engines  (pick the right tool per job)    |
  |    - Shared catalog + governance  (Lake Formation)           |
  |    - Seamless movement  (Spectrum, Athena, zero-ETL)         |
  |    - Common security model  (IAM + Lake Formation + KMS)     |
  |                                                              |
  +==============================================================+

Instead of lake or warehouse, you get lake and warehouse โ€” unified by one catalog, one permission model, one lineage.

The Five Pillars of Modern Data Architecture

     +--------+     +--------+     +--------+     +--------+     +--------+
     |   1    |     |   2    |     |   3    |     |   4    |     |   5    |
     | SCALABLE|    |PURPOSE-|    |SEAMLESS|    |UNIFIED |    |FUTURE- |
     |   DATA  |    |  BUILT |    |  DATA  |    |GOVERN- |    |  PROOF |
     |   LAKE  |    | ENGINES|    |MOVEMENT|    |  ANCE  |    |   ML   |
     +---------+    +--------+    +--------+    +--------+    +--------+
         |              |              |              |              |
         v              v              v              v              v
       S3 +         Athena,        Spectrum,      Lake Form.    SageMaker,
       Lake         Redshift,      Zero-ETL,      + IAM +       Bedrock,
       Form.        EMR, OS,       Federated      KMS + Q       Glue ML
                    DynamoDB        query

Pillar 1 โ€” A Scalable Data Lake at the Core

S3

Every modern architecture starts from S3. Why S3?

Reason

Icon

Impact

Infinite scale

โ™พ๏ธ

Never outgrow it

11 nines durability

๐Ÿ›ก๏ธ

Your data is safer than on any disk

Pennies per GB

๐Ÿ’ฐ

Keep history forever

Native reader for Athena, Redshift, EMR, Glue, SageMaker

๐ŸŒ

One source, many consumers

Pillar 2 โ€” Purpose-Built Engines

One-size-fits-all is dead. Pick the right engine per workload:

Workload

Icon

Engine

Why

Ad-hoc SQL on lake files

๐Ÿ”

Athena

Pay per query, zero setup

Dashboards on curated tables

๐Ÿ›๏ธ

Redshift

Sub-second BI

Petabyte Spark jobs

๐Ÿ˜

EMR

Custom transforms at scale

Sub-ms lookups

โšก

DynamoDB

Key-value queries

Full-text + log search

๐Ÿ”Ž

OpenSearch

Observability and logs

Real-time aggregation

๐ŸŽฏ

Kinesis Data Analytics

Streaming SQL / Flink

Pillar 3 โ€” Seamless Data Movement

Instead of 200 brittle ETL jobs, modern architectures rely on:

Pillar 4 โ€” Unified Governance

Lake Formation IAM KMS Macie CloudTrail

Layer

Icon

Service

Purpose

Identity

๐Ÿ”‘

IAM

Who you are

Fine-grained access

๐Ÿ”

Lake Formation

What columns / rows you see

Encryption

๐Ÿ”’

KMS

At-rest and in-transit

PII scanning

๐Ÿ•ต๏ธ

Macie

Find sensitive data in S3

Audit

๐Ÿ“œ

CloudTrail

Every API call, every query

Data discovery

๐Ÿงญ

DataZone

Business-friendly data catalog

Pillar 5 โ€” Built-in AI / ML

SageMaker Bedrock

ML is no longer a bolt-on โ€” it lives inside the platform:

Capability

Icon

Service

Full ML lifecycle

๐Ÿค–

SageMaker

Foundation models (Claude, Llama, Titan)

๐Ÿง 

Bedrock

SQL-native ML in the warehouse

๐Ÿ”ฎ

Redshift ML

No-code ML in the lake

๐Ÿ“ˆ

SageMaker Canvas

Natural-language BI

๐Ÿ’ฌ

Amazon Q in QuickSight

The Modern Data Architecture โ€” One Picture

  +==================================================================+
  |                                                                  |
  |   +---------+   +---------+   +---------+   +---------+          |
  |   | Batch   |   | Stream  |   | OpTx DB |   | SaaS    |          |
  |   | files   |   | events  |   | (CDC)   |   | apps    |          |
  |   +----+----+   +----+----+   +----+----+   +----+----+          |
  |        |             |             |             |               |
  |        +-------------+-------------+-------------+               |
  |                             |                                    |
  |                             v                                    |
  |     +-----------------------------------------------------+      |
  |     |    AMAZON S3  --- Centralized Data Lake             |      |
  |     |    Raw  ->  Curated  ->  Analytics  (Parquet)       |      |
  |     +---------------------------+-------------------------+      |
  |                                 |                                |
  |                  governed by    |                                |
  |                                 v                                |
  |     +-----------------------------------------------------+      |
  |     |  LAKE FORMATION + GLUE CATALOG + IAM + KMS + MACIE  |      |
  |     +-----------------------------------------------------+      |
  |                                 |                                |
  |       +------------+------------+------------+-------------+     |
  |       |            |            |            |             |    |
  |       v            v            v            v             v    |
  |   +-------+   +---------+   +-----+   +-----------+   +------+  |
  |   | Athena|   |Redshift |   | EMR |   | OpenSearch|   |Sage- |  |
  |   |  SQL  |   |  MPP    |   |Spark|   |   Search  |   |Maker |  |
  |   +---+---+   +----+----+   +--+--+   +-----+-----+   +--+---+  |
  |       |            |           |             |           |      |
  |       +------------+-----------+-------------+-----------+      |
  |                                 |                                |
  |                                 v                                |
  |            +----------------------------------------+            |
  |            |   VALUE  =  QuickSight + Amazon Q + apps|           |
  |            +----------------------------------------+            |
  |                                                                  |
  +==================================================================+

Look carefully: every service from Part 1 has a home. That is modern data architecture.

Modern Data Architecture โ€” In One Sentence

Centralize storage in a governed S3 lake. Use the best engine for each workload. Let data move frictionlessly between them. Secure it all uniformly. Build ML natively on top.

๐Ÿ’ก HitaVir Tech says: "Don't build one monolith. Don't build 20 silos. Build one lake, with many engines, one catalog, one security model. That's how AWS's biggest analytics customers run."

๐Ÿก Lake House in one line: one lake for storage, many engines for compute, one catalog for trust.

  +==============================================================+
  |         THE  COMPLETE  AWS  SERVICE  MAP                     |
  |            for Modern Data Architecture                      |
  +==============================================================+

The Headline Cast for Part 2

S3 Lake Formation Glue Athena Redshift EMR Kinesis Streams Firehose OpenSearch QuickSight SageMaker DataZone

Each service answers a specific question in the modern architecture:

Layer

Question

Service

Icon

Storage

Where does my data live?

Amazon S3

๐Ÿชฃ

Governance

Who can see what?

AWS Lake Formation

๐Ÿ—๏ธ

Catalog

What data do we have?

AWS Glue Data Catalog

๐Ÿ“š

ETL

How do I shape it?

AWS Glue ETL, EMR

๐Ÿ•ธ๏ธ

SQL on lake

How do I explore?

Amazon Athena

๐Ÿ”

SQL on warehouse

How do I serve BI?

Amazon Redshift

๐Ÿ›๏ธ

Stream ingest

How do I handle real time?

Kinesis + Firehose + MSK

๐ŸŒŠ

Search

How do I query logs?

Amazon OpenSearch

๐Ÿ”Ž

BI

How do people see it?

Amazon QuickSight

๐Ÿ“Š

ML

How do we predict?

Amazon SageMaker

๐Ÿค–

Discovery

How do business users find data?

Amazon DataZone

๐Ÿงญ

Service Spotlight โ€” Amazon DataZone

DataZone

  +--------------------------------------------------------------+
  |  AMAZON  DATAZONE  -  Business-Friendly Data Catalog         |
  +--------------------------------------------------------------+
  |  Audience    :  Business analysts, product managers          |
  |  Features    :  Search by business term, request access      |
  |  Integrates  :  Glue Catalog, Redshift, S3, third-party      |
  |  ML assist   :  Auto-generate descriptions for tables        |
  |                                                              |
  |  Bridges the "I need data" gap between teams and engineers.  |
  +--------------------------------------------------------------+

Zero-ETL โ€” The Quiet Revolution

Classic ETL means writing code to extract-transform-load between systems. Zero-ETL makes AWS automatically replicate curated data between services:

  +----------------+           +-----------------------+
  | Aurora MySQL   |  ==ETL==> | Redshift (analytics)  |
  |  (app DB)      |  managed  |                       |
  +----------------+           +-----------------------+

  Supported today:
  - Aurora  -> Redshift
  - RDS     -> Redshift
  - DynamoDB -> OpenSearch
  - S3      -> Redshift  (auto-copy)

Fewer pipelines to maintain. Fresher analytics. Less on-call pain.

AWS Glue โ€” The Connective Tissue

Glue Crawler Catalog DataBrew Data Quality

In a modern architecture, Glue is everywhere:

Capability

Icon

Role

Crawlers

๐Ÿ•ท๏ธ

Auto-discover schemas in S3

Data Catalog

๐Ÿ“š

The shared metadata layer

ETL jobs (Spark / Python)

๐Ÿ•ธ๏ธ

Transform raw โ†’ curated

DataBrew

๐Ÿงช

No-code cleaning for analysts

Data Quality

๐Ÿ›ก๏ธ

Rules enforced on every run

Workflows

๐Ÿงญ

Orchestrate multi-job pipelines

A Mental Shortcut โ€” The Service Lookup Table

When someone describes a problem, scan this table first:

Problem Sounds Like...

Reach For

"We have terabytes piling up and need cheap storage"

๐Ÿชฃ S3 + ๐ŸงŠ Glacier

"Analysts want SQL on 1B rows, must return in seconds"

๐Ÿ›๏ธ Redshift

"We dump random files hourly, want ad-hoc SQL"

๐Ÿ” Athena + ๐Ÿ•ธ๏ธ Glue

"Events come at 1M / sec and drive a live dashboard"

๐ŸŒŠ Kinesis + ๐ŸŽฏ K. Analytics

"Logs need to be searchable with keyword filters"

๐Ÿ”Ž OpenSearch

"We keep 2 copies of the same data in lake and warehouse"

๐Ÿ”ญ Redshift Spectrum / Zero-ETL

"We need to share a slice with another AWS account"

๐Ÿ”— Redshift Data Sharing / Lake Formation grants

"Non-engineers can't find any data"

๐Ÿงญ DataZone

"We want the CEO to ask questions in English"

๐Ÿ’ฌ Amazon Q in QuickSight

๐Ÿ’ก HitaVir Tech says: "When a junior asks โ€˜which AWS service should we use?', the senior reply is another question โ€” โ€˜what is the actual pattern?' Service choice without pattern = tech for tech's sake."

  +==============================================================+
  |         SECTION  2   -   COMMON  USE  CASES                  |
  |           (where the patterns show up in real life)          |
  +==============================================================+

Most real-world analytics work on AWS falls into six repeatable patterns. Recognize them and you'll know which reference architecture to reach for.

The Service Cast Across All Six Patterns

Redshift QuickSight Kinesis Kinesis Analytics OpenSearch S3 Glue SageMaker Lake Formation

The Six Patterns

  +-----------------------------------------------------------+
  |  1.  BATCH  BI           - nightly dashboards             |
  |  2.  REAL-TIME  ANALYTICS- live metrics, fraud, IoT       |
  |  3.  LOG  /  APP  OBS.   - search + troubleshoot logs     |
  |  4.  CUSTOMER  360       - unify profiles across sources  |
  |  5.  ML  /  PREDICTIVE   - forecast, recommend, score     |
  |  6.  DATA  MESH          - domain-owned, shared data      |
  +-----------------------------------------------------------+

Use Case 1 โ€” Batch Business Intelligence

Redshift QuickSight

Who needs it: Every company with a CFO.

Shape: Operational databases + flat files โ†’ data lake โ†’ warehouse โ†’ dashboards.

Trait

Value

Freshness

Daily or hourly

Volume

GB to TB

Velocity V

๐Ÿข Batch

Core service

๐Ÿ›๏ธ Redshift

Example prompt: "Revenue by region compared to last quarter, refreshed every morning at 8 am."

Use Case 2 โ€” Real-Time Analytics

Kinesis Kinesis Analytics Lambda

Who needs it: Rideshare, fintech, ad-tech, IoT, online gaming.

Shape: Events โ†’ Kinesis โ†’ stream processor โ†’ live dashboard and S3 for history.

Trait

Value

Freshness

Sub-second to seconds

Volume

Millions of events / sec

Velocity V

โšก Streaming

Core service

๐ŸŒŠ Kinesis + ๐ŸŽฏ Kinesis Analytics

Example prompt: "Alert the risk team the moment any card transaction looks fraudulent."

Use Case 3 โ€” Log & Application Observability

OpenSearch Kinesis Firehose CloudTrail

Who needs it: Every engineering team at scale.

Shape: Application logs โ†’ Firehose โ†’ OpenSearch + S3 archive.

Trait

Value

Freshness

Seconds

Volume

TB/day in logs

Velocity V

โšก Streaming

Core service

๐Ÿ”Ž OpenSearch

Example prompt: "Search the last 30 days of production logs for any mention of this request ID."

Use Case 4 โ€” Customer 360

S3 Glue Redshift SageMaker

Who needs it: Retail, banking, telecom, SaaS.

Shape: Unify profiles from CRM, web, mobile, support into one view, served to marketing + ML.

Trait

Value

Freshness

Hourly

Volume

TB

Dominant V

๐Ÿงฉ Variety

Core service

๐Ÿชฃ S3 + ๐Ÿ•ธ๏ธ Glue + ๐Ÿ›๏ธ Redshift

Example prompt: "Show one customer's full lifetime journey โ€” ads seen, orders placed, tickets filed."

Use Case 5 โ€” ML / Predictive Analytics

SageMaker S3 Lambda

Who needs it: Forecasting, recommendations, fraud, churn, dynamic pricing.

Shape: Lake โ†’ feature store โ†’ model training โ†’ model endpoint โ†’ prediction served in app or BI.

Trait

Value

Freshness

Training weekly, inference real-time

Volume

GB to PB of history

Dominant V

๐Ÿ’Ž Value

Core service

๐Ÿค– SageMaker + ๐Ÿชฃ S3

Example prompt: "Predict which customers will churn next month so we can retain them."

Use Case 6 โ€” Data Mesh

Lake Formation DataZone S3

Who needs it: Enterprises with many product teams owning their own data.

Shape: Each domain team curates its own data products on S3; a central catalog (Lake Formation + DataZone) makes them discoverable and access-controlled.

Trait

Value

Freshness

Per-domain

Ownership

Distributed to domain teams

Dominant V

๐Ÿ›ก๏ธ Veracity + ๐Ÿ’Ž Value

Core service

๐Ÿ—๏ธ Lake Formation + ๐Ÿงญ DataZone

Example prompt: "The Finance team owns โ€˜invoices', Marketing owns โ€˜campaigns', but anyone at the company can discover and request access."

Picking the Right Pattern โ€” Decision Cheat

                What is the dominant question?
                            |
       +-------+---+-----+----+-----+----+-------+
       |       |         |         |    |       |
    Weekly   Instant  Find a    One     Predict  Federated
    KPIs     alerts   log line  360 view future   ownership
       |       |         |         |    |       |
       v       v         v         v    v       v
     BATCH   REAL-    LOG      CUSTOMER  ML /   DATA
      BI     TIME     OBS.      360      PRED.   MESH

๐Ÿ’ก HitaVir Tech says: "New engineers try to force every problem into the pattern they already know. Seniors look at the dominant V and pick the shape โ€” then fill in services. Diagnose first. Build second."

  +==============================================================+
  |           SECTION  3  -  REFERENCE  ARCHITECTURES            |
  +==============================================================+

For each use case, here is a whiteboard-ready AWS architecture you can copy, adapt, and defend in a design review.

Reference 1 โ€” Batch BI on a Lake House

S3 Glue Redshift QuickSight

  Operational DBs (RDS, Aurora)      SaaS apps (Salesforce)
           |                                |
           +---- AWS DMS (CDC) ----+  AppFlow
                                   |
                                   v
            +--------------------------------------+
            |     Amazon S3 Data Lake              |
            |     raw -> curated (Parquet)         |
            +---------------+----------------------+
                            |
                            v  (Glue ETL, Data Quality, Catalog)
                            |
            +---------------+----------------------+
            |   Amazon Redshift (curated tables)   |
            |   - Star schemas                     |
            |   - Nightly loads via COPY           |
            +---------------+----------------------+
                            |
                            v
                   Amazon QuickSight
                     (SPICE dashboards)
                            |
                            v
                        Executives

When to pick it: Finance, ops, exec reporting. Stable queries, predictable loads.

Reference 2 โ€” Real-Time Analytics

Kinesis Streams Firehose Kinesis Analytics Lambda OpenSearch S3 QuickSight

  Producers (apps, IoT, clickstream)
             |
             v
   +-------------------------+
   |  Kinesis Data Streams   |
   |  (durable, ordered)     |
   +------------+------------+
                |
    +-----------+------------+------------+
    |                        |            |
    v                        v            v
 Kinesis Analytics         Lambda       Firehose
 (continuous SQL)       (alerting on    |
    |                    anomalies)     v
    |                        |        S3 lake
    v                        v      (Parquet, hist.)
 Live dashboard            SNS/SQS          |
 (QuickSight or              |              v
 OpenSearch dash.)        Ops team      Athena
                          phone          ad-hoc

When to pick it: Fraud detection, live pricing, real-time personalization, IoT.

Reference 3 โ€” Log & Application Observability

CloudTrail Firehose OpenSearch S3

  App servers / containers / Lambda / CloudTrail
                      |
                      v
              Kinesis Firehose
                      |
           +----------+-----------+
           |                      |
           v                      v
      OpenSearch               S3 (cold)
      (hot, 30 days)           (cheap, years)
           |                      |
           v                      v
      Kibana / OS           Athena for historical
      Dashboards            search and audits

When to pick it: SRE and platform teams, security logs, microservice observability.

Reference 4 โ€” Customer 360

S3 Glue Redshift SageMaker

  CRM        Web events     Mobile app     Support tickets
    |            |              |                |
    +------------+------+-------+----------------+
                        |
                        v
             S3 Data Lake (raw)
                        |
                        v  Glue ETL + Data Quality
                        |
             S3 Data Lake (curated)
                        |
                        v
      +-----------------+------------------+
      |                                    |
      v                                    v
   Redshift                          SageMaker
   Unified Customer                  Features + Models
   table (serving BI)               (churn, LTV, NBA)
      |                                    |
      v                                    v
   QuickSight 360                  Marketing automation
   dashboard                       (personalized offers)

When to pick it: Retail, banking, telecom, subscription SaaS.

Reference 5 โ€” ML / Predictive Analytics

S3 Glue SageMaker Lambda

  +----------------------+
  |   S3 Data Lake       |
  |   (historical data)  |
  +----------+-----------+
             |
             v
  +----------------------+
  |  Glue / EMR          |
  |  Feature engineering |
  +----------+-----------+
             |
             v
  +-------------------------+
  |  SageMaker Feature Store|
  +-----+-----------+-------+
        |           |
  (training)    (serving)
        v           v
  SageMaker      Real-time
  Training        endpoint
   Jobs           (low-latency)
        |           |
        v           v
     Models    Mobile / web app
                (personalized UX)

When to pick it: Recommenders, fraud detection, demand forecasting, dynamic pricing.

Reference 6 โ€” Data Mesh

Lake Formation DataZone S3

  Domain A (Orders)      Domain B (Marketing)     Domain C (Finance)
  owns its own pipes     owns its own pipes       owns its own pipes
     |                       |                         |
     v                       v                         v
  S3 bucket + Glue        S3 bucket + Glue          S3 bucket + Glue
  Catalog (Orders)        Catalog (Marketing)       Catalog (Finance)
     |                       |                         |
     +-----------+-----------+-------------------------+
                 |
                 v
       +-----------------------------------+
       |   Lake Formation  (central RBAC)  |
       |   + Amazon DataZone  (discovery)  |
       +---------------+-------------------+
                       |
          +------------+------------+
          |            |            |
          v            v            v
        Analyst    Data sci.    Executive
       (Athena)   (SageMaker)   (QS / Q)

When to pick it: Large enterprises where domain teams must own their data products, but a central platform team guarantees governance.

Reference Architecture Comparison

#

Pattern

Primary V

Storage

Compute

Serve

1

Batch BI

Volume

S3 + Redshift

Glue, Redshift

QuickSight

2

Real-Time

Velocity

Kinesis + S3

Kinesis Analytics, Lambda

OpenSearch, QuickSight

3

Log Obs.

Velocity + Variety

OpenSearch + S3

Firehose

Kibana, Athena

4

360

Variety + Value

S3 + Redshift

Glue

QuickSight + apps

5

ML

Value

S3

Glue, SageMaker

Endpoint in app

6

Mesh

Veracity + Value

Distributed S3

Per-domain

Lake Formation + DataZone

๐Ÿ’ก HitaVir Tech says: "Architects don't memorize 50 services โ€” they memorize 6 shapes. When someone brings a new problem, they map it to a shape first, then pick services to fit. Copy these six patterns. Most of your career, you'll be adapting one of them."

  +==============================================================+
  |             QUIZ  -   TEST  YOUR  UNDERSTANDING              |
  +==============================================================+

The Services Under Test

S3 Redshift Glue Athena Kinesis Lake Formation DataZone QuickSight

Answer each question before revealing. No peeking โ€” this is how you build real recall.

Question 1 โ€” Data Lake Fundamentals

Which of the following best describes a data lake?

Answer: B. A lake holds any data in native format; multiple engines (Athena, Redshift, EMR, SageMaker) read from it.

Question 2 โ€” Data Warehouse Property

Which property is specific to data warehouses, not data lakes?

Answer: C. Columnar + MPP is the warehouse signature, enabling fast aggregation on billions of rows.

Question 3 โ€” Medallion Zones

Match each zone to its typical reader:

Zone

Readers

Raw (Bronze)

?

Curated (Silver)

?

Analytics (Gold)

?

Answer: Raw = data engineers only. Curated = analysts and ML engineers. Analytics = business users and BI tools.

Question 4 โ€” Lake House Core Service

In an AWS Modern Data Architecture, which service is the "source of truth" storage layer?

Answer: B. S3. Every other analytics engine on AWS reads from it.

Question 5 โ€” Redshift Spectrum

What does Redshift Spectrum enable?

Answer: C. Spectrum lets you query lake data from the warehouse โ€” no duplicate storage.

Question 6 โ€” Governance

Which service provides fine-grained (column/row) permissions on the S3 data lake?

Answer: B. Lake Formation. IAM provides coarse identity; Macie finds PII; CloudTrail audits; Lake Formation enforces column and row-level access.

Question 7 โ€” Pattern Matching

A fraud team needs to block bad transactions within 200 ms. Which pattern fits?

Answer: B. Real-time analytics with streaming + serverless alerting.

Question 8 โ€” Zero-ETL

Zero-ETL between Aurora and Redshift eliminates the need to:

Answer: B. Zero-ETL replicates changes automatically โ€” no pipeline code to maintain.

Question 9 โ€” Business Data Discovery

Which service helps non-engineers discover datasets using business terms instead of table names?

Answer: C. DataZone provides a business-friendly catalog on top of the Glue Catalog.

Question 10 โ€” Anti-Pattern

A company stores 5 years of clickstream JSON in S3 but has no Glue Catalog, no Lake Formation, no quality rules. What is this?

Answer: B. A data swamp โ€” no catalog, no governance, no trust. Storage alone is not an architecture.

Score Yourself

Score

Meaning

9-10

๐ŸŽ“ You are ready for production design reviews

7-8

๐Ÿง  Solid. Re-read the reference architectures section

5-6

๐Ÿ“š Review the three architecture chapters and retake

< 5

๐Ÿ”„ Re-do Part 1 first โ€” the 5 Vs are the foundation

๐Ÿ’ก HitaVir Tech says: "Don't guess. The questions here are the same ones that show up in every AWS analytics interview. Know them cold."

  +==============================================================+
  |          CONGRATULATIONS  -  PART 2 DONE!                    |
  +==============================================================+

Every Service You Now Know

S3 Glacier Redshift EMR Glue Athena Kinesis Kinesis Firehose Lake Formation OpenSearch QuickSight SageMaker Bedrock DataZone

What You Learned

๐Ÿชฃ Data Lakes

Topic

Icon

What a lake is (schema-on-read)

๐Ÿ“–

The medallion zones (raw, curated, analytics)

๐Ÿฅ‰๐Ÿฅˆ๐Ÿฅ‡

Why lakes become swamps without governance

๐ŸŠ

๐Ÿ›๏ธ Data Warehouses

Topic

Icon

Schema-on-write, columnar, MPP

๐Ÿ“Š

Star schemas (fact + dimensions)

โญ

Redshift serverless vs provisioned

๐Ÿ›๏ธ

Redshift Spectrum (querying the lake)

๐Ÿ”ญ

๐Ÿก Modern Data Architecture (Lake House)

Pillar

Icon

Scalable data lake

๐Ÿชฃ

Purpose-built engines

๐Ÿงฐ

Seamless movement (Spectrum, zero-ETL)

๐Ÿ”€

Unified governance

๐Ÿ”

Built-in AI / ML

๐Ÿค–

๐Ÿ—บ๏ธ Six Reference Architectures

#

Pattern

Core Service

1

Batch BI

๐Ÿ›๏ธ Redshift + ๐Ÿ“Š QuickSight

2

Real-Time

๐ŸŒŠ Kinesis + ๐ŸŽฏ K. Analytics

3

Log Observability

๐Ÿ”Ž OpenSearch

4

Customer 360

๐Ÿชฃ S3 + ๐Ÿ•ธ๏ธ Glue + ๐Ÿ›๏ธ Redshift

5

ML / Predictive

๐Ÿค– SageMaker + ๐Ÿชฃ S3

6

Data Mesh

๐Ÿ—๏ธ Lake Formation + ๐Ÿงญ DataZone

The Mental Upgrade From Part 1 to Part 2

Part 1 Gave You...

Part 2 Gave You...

๐Ÿ“ A diagnostic framework (5 Vs)

๐Ÿ—บ๏ธ Reference architectures

โ˜๏ธ One service per V

๐Ÿงฉ Combinations that scale

๐Ÿ› ๏ธ A basic S3 โ†’ Glue โ†’ Athena lab

๐Ÿก The full Lake House picture

๐ŸŽฏ What to pick for each bottleneck

๐ŸŽฏ Which shape fits each real-world problem

What To Do Next

  1. ๐Ÿ”Ž Draw one of the six reference architectures for a project you know.
  2. ๐Ÿ’ฌ Walk a teammate through it โ€” teaching cements understanding.
  3. ๐Ÿงช Extend the Part 1 lab: add Glue ETL to move raw/ โ†’ curated/ as Parquet.
  4. ๐Ÿ“š Read the AWS Well-Architected Analytics Lens (link in the Appendix).
  5. ๐ŸŽ“ Try the free AWS Skill Builder course to reinforce the concepts in a different voice.

Final Thoughts

  +==============================================================+
  |                                                              |
  |   Part 1  =  the 5 Vs  +  one service per V                  |
  |   Part 2  =  three architectures  +  six reference shapes    |
  |                                                              |
  |   Together  =  you can design and defend                     |
  |               an analytics platform on AWS.                  |
  |                                                              |
  +==============================================================+

๐Ÿ’ก HitaVir Tech says: "You just completed the same journey a new hire at a top cloud team makes in their first three months. Keep the cheat sheet, apply the patterns, and your architecture reviews will sound like a 5-year veteran's. Fundamentals compound."

๐ŸŽ“ Welcome to the Lake House. See you in Part 3 โ€” hands-on Glue ETL, Redshift loading, and a capstone project.

โ€” HitaVir Tech โ˜๏ธ

  +==============================================================+
  |               RESOURCES  AND  NEXT  STEPS                    |
  +==============================================================+

Services Referenced in This Appendix

S3 Glue Athena Redshift Kinesis OpenSearch QuickSight SageMaker Lake Formation DataZone

Official AWS Documentation

Topic

Icon

Where to Read

Amazon S3

๐Ÿชฃ

docs.aws.amazon.com/s3

AWS Lake Formation

๐Ÿ—๏ธ

docs.aws.amazon.com/lake-formation

AWS Glue

๐Ÿ•ธ๏ธ

docs.aws.amazon.com/glue

Amazon Athena

๐Ÿ”

docs.aws.amazon.com/athena

Amazon Redshift

๐Ÿ›๏ธ

docs.aws.amazon.com/redshift

Amazon Kinesis

๐ŸŒŠ

docs.aws.amazon.com/kinesis

Amazon OpenSearch

๐Ÿ”Ž

docs.aws.amazon.com/opensearch-service

Amazon QuickSight

๐Ÿ“Š

docs.aws.amazon.com/quicksight

Amazon SageMaker

๐Ÿค–

docs.aws.amazon.com/sagemaker

Amazon DataZone

๐Ÿงญ

docs.aws.amazon.com/datazone

AWS Whitepapers (Free, Highly Recommended)

Whitepaper

Icon

Why Read It

AWS Well-Architected Framework โ€” Analytics Lens

๐Ÿ“

The canonical design checklist

Modern Data Architecture on AWS

๐Ÿก

The lake house, explained by AWS itself

Data Analytics Lens โ€” Reference Architectures

๐Ÿ—บ๏ธ

AWS-blessed reference diagrams

Building a Data Lake on AWS

๐Ÿชฃ

Zone design, governance patterns

Streaming Data Solutions

๐ŸŒŠ

Choosing between Kinesis, MSK, Firehose

Free Training

Resource

Icon

Provider

AWS Skill Builder โ€” Analytics Learning Plan

๐ŸŽ“

aws.amazon.com/training

AWS Solutions Architect โ€” Associate path

๐ŸŽฏ

AWS Training

AWS Data Analytics Specialty certification

๐Ÿ†

AWS Certification

Hands-on Labs to Try Next

Books

Book

Why

Designing Data-Intensive Applications โ€” Martin Kleppmann

The single best data engineering book ever written

The Data Warehouse Toolkit โ€” Ralph Kimball

Star schemas, dimensional modeling, classics

Fundamentals of Data Engineering โ€” Joe Reis

Modern, cloud-era data engineering

Community

  +==============================================================+
  |           AWS  ANALYTICS  -  PART 2  CHEAT  SHEET            |
  |                  (screenshot and keep)                       |
  +==============================================================+

The Lake House Toolbox โ€” At a Glance

S3 Redshift Glue Athena EMR Kinesis OpenSearch DynamoDB Lake Formation Macie QuickSight SageMaker Bedrock DataZone

๐Ÿชฃ Data Lake in 30 Seconds

  Store ANY data  --->  S3 (raw | curated | analytics)
  Schema-on-READ  --->  decided at query time
  Read by         --->  Athena, Redshift, EMR, SageMaker
  Governed by     --->  Lake Formation + Glue Catalog

๐Ÿ›๏ธ Data Warehouse in 30 Seconds

  Columnar + MPP  --->  fast aggregation on billions of rows
  Schema-on-WRITE --->  decided before load
  Star schema     --->  fact + dimensions
  AWS service     --->  Amazon Redshift (serverless or RA3)

๐Ÿก Modern Data Architecture in 30 Seconds

  One lake  +  many engines  +  one catalog  +  one security model  +  native ML

Pillar

Icon

Services

Scalable lake

๐Ÿชฃ

S3

Purpose-built engines

๐Ÿงฐ

Athena, Redshift, EMR, OpenSearch, DynamoDB

Seamless movement

๐Ÿ”€

Spectrum, Zero-ETL, Federated query

Unified governance

๐Ÿ”

Lake Formation, IAM, KMS, Macie, CloudTrail

Built-in ML

๐Ÿค–

SageMaker, Redshift ML, Bedrock, Amazon Q

๐Ÿ—บ๏ธ The Six Reference Architectures

#

Pattern

When

Core

1

Batch BI

Daily dashboards

๐Ÿ›๏ธ Redshift + ๐Ÿ“Š QuickSight

2

Real-Time

Fraud, IoT, live

๐ŸŒŠ Kinesis + ๐ŸŽฏ Analytics

3

Log Obs.

Search production logs

๐Ÿ”Ž OpenSearch

4

360

Unify customer view

๐Ÿชฃ S3 + ๐Ÿ•ธ๏ธ Glue + ๐Ÿ›๏ธ Redshift

5

ML

Predict & personalize

๐Ÿค– SageMaker

6

Data Mesh

Enterprise domain ownership

๐Ÿ—๏ธ Lake Formation + ๐Ÿงญ DataZone

๐ŸŽฏ The Golden Rules

๐Ÿง  The One-Sentence Takeaways

๐Ÿ“ˆ Next Steps

  1. Redraw one of the six architectures for a project in your current job.
  2. Extend the Part 1 lab: add a Glue ETL job to convert CSV to Parquet.
  3. Try Redshift Serverless with the sample tickit dataset.
  4. Read the AWS Well-Architected Analytics Lens end-to-end.
  5. Move to Part 3 โ€” Hands-On Lake House (Glue ETL + Redshift + QuickSight capstone).

AWS service icons used in this codelab are from the official AWS Architecture Icons deck, freely distributed by Amazon Web Services for use in architecture diagrams and educational materials.