System Design
Welcome to the System Design tutorial. This guide walks you through how to architect software systems that scale, stay reliable, and handle real-world load — with a modern lens on AI-powered applications and the engineering decisions that come with them.
Why Learn System Design in the AI Era?#
AI coding agents like Claude Code can scaffold an entire service, write migrations, and wire up infrastructure — all from a single prompt. But they operate within the architecture you define. Deciding whether to shard your database or add a read replica, choosing between a message queue and a synchronous call, estimating whether your system survives 10x traffic on the current design — that judgment still lives with you. The better you understand system design, the better you can direct AI agents to build the right thing, not just a working thing.
System design is the skill that turns code into production software. Without it, technically correct code still fails — it's just slow, expensive, or unreliable instead of broken. Understanding system design enables you to:
- Direct AI agents effectively — Give AI the architectural context it needs to generate code that fits correctly into the larger system, not just code that compiles
- Evaluate AI-generated architecture — Spot when a suggested design doesn't account for scale, cost, or failure modes
- Make trade-off decisions — Choose between SQL and NoSQL, REST and WebSockets, monolith and microservices — based on actual requirements, not hype
- Estimate before you build — Know roughly what something will cost and how it will perform before writing a single line of code
- Design for the AI layer — Understand how context windows, token costs, caching, and model routing are first-class architectural concerns in 2026
Learning Path#
Topics build on each other. Start with Foundations to build the core mental models, move through the Building Blocks every production system uses, then go deeper into Reliability, Distribution, and the AI Infrastructure Layer. The curriculum closes with Trade-offs & Production Reality and then Case Studies that tie everything together.
Topics#
1. Foundations#
The Mental Models
Move from "how code runs" to "how systems behave." Covers non-functional requirements, the CAP theorem, back-of-the-envelope estimation, and basic networking — the vocabulary you need before drawing any diagrams.
Start learning →Topics covered:
- Scalability vs. Performance vs. Reliability — three properties that are often confused but mean very different things
- Latency vs. Throughput — and Time to First Token (TTFT) as the AI-specific latency metric
- The CAP Theorem — why distributed systems must choose between Consistency and Availability when things go wrong
- Back-of-the-envelope math — estimating RPS, storage, and AI token costs before writing any code
- HTTP/3, WebSockets, DNS, and Global Server Load Balancing
2. Core Building Blocks#
Application Layer
Monolith vs. microservices — when to stay together and when to split.
Read →API Design
REST vs. GraphQL vs. gRPC, versioning, and cursor-based pagination.
Read →Storage
Relational, NoSQL, and object storage — choosing the right home for your data.
Read →Caching & CDNs
Cache-aside vs. write-through, TTLs, and why "just add a cache" can go wrong.
Read →Load Balancing & Traffic
L4 vs. L7 balancing, rate limiting, and the Token Bucket algorithm.
Read →3. Data Modeling#
Schema Design
Modeling entities and relationships in a relational database.
Read →Indexes
How indexes speed up reads and what they cost in write performance.
Read →Normalization vs. Denormalization
When to keep data tidy vs. when to duplicate it for speed.
Read →Choosing a Storage Engine
The SQL vs. NoSQL decision based on data shape, access patterns, and consistency needs — not hype.
Read →4. Reliability & Scalability#
Scaling Up vs. Scaling Out
Vertical vs. horizontal scaling — the first question every growing system faces.
Read →Database Replication
Primary/replica setups that offload reads and buy time before sharding.
Read →Scaling Data
Sharding, partitioning, and avoiding the hot key problem.
Read →Consistency Challenges
Eventual vs. strong consistency, and idempotency to prevent double-charges on retries.
Read →Fault Tolerance
Circuit breakers, graceful degradation, and chaos engineering.
Read →5. Distributed Communication#
Sync vs. Async
When to use REST/gRPC and when to use message queues.
Read →Event-Driven Design
Pub/Sub vs. event streams (Redis, RabbitMQ, Kafka) and Dead Letter Queues.
Read →Coordination
Distributed locking and leader election in stateful systems.
Read →6. Containers & Deployment#
Docker
Packaging your app and its dependencies into a portable, reproducible image.
Read →Container Orchestration
Kubernetes — scheduling, self-healing, and scaling containers across a cluster.
Read →CI/CD Pipelines
How code goes from a commit to production safely and automatically.
Read →Deployment Strategies
Blue/green and canary deployments for zero-downtime releases.
Read →7. The AI Infrastructure Layer#
Context Management
The context window as a finite resource — sliding windows, summarization, and retrieval-augmented memory.
Read →RAG
Retrieval-Augmented Generation — ingestion pipelines, hybrid search, and vector databases.
Read →AI Agents
LLM gateways, agentic loops, multi-agent topologies, and failure modes.
Read →Agent Harness Engineering
Building the infrastructure that wraps an LLM into an agent — execution loops, tool dispatch, context management, guardrails, and observability.
Read →Patterns in Claude Code
How Claude Code works under the hood and how to build a minimal version yourself.
Read →8. Security & Observability#
Zero-Trust Security
OAuth/JWT for users, machine-to-machine auth for services.
Read →AI Security
Prompt injection prevention and PII masking in LLM logs.
Read →Observability
Logs, Metrics, and Traces — the three pillars and SLIs/SLOs for AI-powered features.
Read →9. Trade-offs & Production Reality#
Clarifying Requirements
Why the first step is always asking "what problem are we actually solving?"
Read →Build vs. Buy
When to use a managed service vs. hosting your own.
Read →Simplicity vs. Scalability
Don't build for 100M users if you have 100.
Read →The Cost of Complexity
Every new box in your diagram is a new way to fail.
Read →What Diagrams Miss
Budgets, legacy code, org politics, and the hidden costs of "just add a cache."
Read →Operational Excellence
Why "easy to debug" beats "technically perfect."
Read →10. System Design Case Studies#
URL Shortener
The classic intro to databases and hashing.
Read →Global Rate Limiter
Distributed state and caching under high-traffic conditions.
Read →Notification System
Delivering alerts reliably across push, email, and SMS at scale.
Read →Real-time Chat App
Mastering WebSockets and Pub/Sub for live messaging.
Read →Social Media Feed
The fanout problem — how one post reaches millions without melting the database.
Read →Video Streaming
Encoding pipelines, adaptive bitrate, and CDN delivery at global scale.
Read →Distributed File Storage
Chunking, replication, and metadata management for petabyte-scale storage.
Read →AI-Powered Customer Support (RAG)
The modern standard — and how to keep the knowledge base updated in real-time.
Read →