🏔️ Data Engineering Learning Journey

A structured, hands-on learning repo documenting my path through modern Data Engineering — starting with Snowflake, expanding into the full DE stack.

👋 Why This Repo Exists

I'm building real, hands-on experience in Data Engineering by learning in public. This repo contains my SQL scripts, Python experiments, mini-projects, and reflections as I work through each tool and concept — starting with Snowflake.

📓 Deep-dive notes, diagrams & reflections → Notion Knowledge Base

🗺️ Learning Roadmap

Phase 1 — Snowflake (Current)

📓 All Snowflake notes in one place → Snowflake Notion Page

Week	Topic	Status
1	Architecture, Virtual Warehouses, Databases & Schemas	✅ Done
2	Data Loading — COPY INTO, Stages, File Formats	✅ Done
3	Semi-structured Data — VARIANT, FLATTEN, JSON	🔄 In Progress
4	Time Travel, Fail-safe, Cloning	⬜ Planned
5	Performance — Clustering, Result Cache, Query Profiling	⬜ Planned
6	Snowpipe, Tasks, Streams (CDC)	⬜ Planned
7	Snowpark (Python in Snowflake)	⬜ Planned
8	RBAC, Data Sharing, Security Best Practices	⬜ Planned

Phase 2 — Orchestration (Upcoming)

Apache Airflow — DAGs, Operators, XComs
dbt (data build tool) — Models, Tests, Documentation

Phase 3 — Data Movement & Storage (Planned)

Apache Kafka — Producers, Consumers, Topics
Delta Lake / Iceberg — Table formats
AWS S3 / Azure Blob — Cloud storage patterns

Phase 4 — Capstone Project (Planned)

End-to-end pipeline: ingestion → transformation → serving layer

📁 Repo Structure

data-engineering-journey/
│
├── snowflake/
│   ├── 01-basics/                  # Warehouses, databases, schemas
│   ├── 02-data-loading/            # Stages, COPY INTO, file formats
│   ├── 03-semi-structured/         # JSON, VARIANT, FLATTEN
│   ├── 04-time-travel/             # Time Travel & Fail-safe
│   ├── 05-performance/             # Clustering, caching, query profiling
│   ├── 06-streams-tasks/           # Snowpipe, Tasks, Streams
│   ├── 07-snowpark/                # Python in Snowflake
│   └── projects/                   # Mini end-to-end projects
│
├── airflow/                        # (Coming soon)
├── dbt/                            # (Coming soon)
├── kafka/                          # (Coming soon)
│
└── resources.md                    # Curated links, docs, courses

❄️ Snowflake — Key Concepts Covered

Architecture

Snowflake's three-layer architecture: Storage / Compute / Cloud Services
Virtual Warehouses — sizing, auto-suspend, auto-resume
Multi-cluster warehouses for concurrency scaling

Data Loading

Internal & External Stages
COPY INTO with CSV, JSON, Parquet
File Format objects and options

More concepts added weekly. Follow along ⭐

🛠️ Tools & Stack

Tool	Purpose	Level
Snowflake	Cloud Data Warehouse	🔄 Learning
SQL	Query & transformation language	✅ Comfortable
Python	Scripting & Snowpark	✅ Comfortable
dbt	Data transformation framework	⬜ Planned
Apache Airflow	Orchestration	⬜ Planned
Apache Kafka	Streaming ingestion	⬜ Planned
Git / GitHub	Version control & learning in public	✅ Using
Docker	Containerisation & local dev environments	🔄 Learning
GitHub Actions	CI/CD pipelines & workflow automation	🔄 Learning
Terraform	Infrastructure as Code (IaC)	⬜ Planned
Linux / Bash	Scripting & server operations	⬜ Planned
Power BI	Dashboard building	✅ Comfortable

📓 Notion Knowledge Base

I maintain detailed notes alongside this repo — including:

Concept explanations in my own words
Diagrams & architecture sketches
"Gotchas" and lessons learned
Topic-by-topic summaries

👉 Open Notion Notes

📌 How I Learn (My Process)

Read official docs / watch a focused tutorial
Write notes in Notion — explain it like I'd teach it
Code it — reproduce the concept from scratch in this repo
Reflect — commit with a meaningful message describing what I learned
Review — revisit tricky concepts a week later

📬 Connect

If you're on a similar learning path, I'd love to connect.

💼 LinkedIn
📓 Notion Notes

Updated regularly as I progress. Last updated: 29.04. 2026.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
PySpark		PySpark
Snowflake		Snowflake
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏔️ Data Engineering Learning Journey

👋 Why This Repo Exists

🗺️ Learning Roadmap

Phase 1 — Snowflake (Current)

Phase 2 — Orchestration (Upcoming)

Phase 3 — Data Movement & Storage (Planned)

Phase 4 — Capstone Project (Planned)

📁 Repo Structure

❄️ Snowflake — Key Concepts Covered

Architecture

Data Loading

🛠️ Tools & Stack

📓 Notion Knowledge Base

📌 How I Learn (My Process)

📬 Connect

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏔️ Data Engineering Learning Journey

👋 Why This Repo Exists

🗺️ Learning Roadmap

Phase 1 — Snowflake (Current)

Phase 2 — Orchestration (Upcoming)

Phase 3 — Data Movement & Storage (Planned)

Phase 4 — Capstone Project (Planned)

📁 Repo Structure

❄️ Snowflake — Key Concepts Covered

Architecture

Data Loading

🛠️ Tools & Stack

📓 Notion Knowledge Base

📌 How I Learn (My Process)

📬 Connect

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages