Video Streaming
You press play on a YouTube video and it starts within a second — even on a slow connection. Behind that instant startup is one of the most infrastructure-intensive problems in distributed systems: delivering high-definition video to billions of devices with different screen sizes, network speeds, and hardware capabilities, all without buffering.
Think of it like a restaurant chain that serves 1 billion meals a day. The kitchen (your origin servers) cannot cook every meal to order in real time. Instead, it pre-cooks meals in multiple portion sizes, stores them in local branches (CDN edge nodes) close to customers, and lets each customer pick the portion that fits their appetite (adaptive bitrate). The kitchen's job is preparation; the branches handle delivery.
This case study answers: how does a single uploaded video reach billions of viewers worldwide without melting the infrastructure?
This case study follows the same framework:
- Clarify constraints (Steps 1–2) — What does the system do, and how much traffic must it handle?
- High-level design (Step 3) — What are the major components and how do they connect?
- Deep dives (Steps 4–7) — How do the trickiest parts actually work?
- Trade-offs (Step 8) — What did we give up, and when would we choose differently?
Step 1: Clarify Requirements#
Before designing anything, pin down exactly what the system needs to do and how well it needs to do it.
Functional Requirements#
These describe what the system does.
| Feature | Description | Priority |
|---|---|---|
| Upload videos | Users can upload video files of any common format (MP4, MOV, AVI, MKV). The system validates, processes, and stores them durably. | Core |
| Stream/watch videos | Users can watch videos with instant playback. The video adapts quality in real time based on available bandwidth. | Core |
| Search videos | Users can search by title, description, and tags. Results are ranked by relevance. | Core |
| Video metadata | Each video has a title, description, thumbnail, upload date, view count, and duration. Metadata is editable by the uploader. | Core |
| Like and comment | Users can like videos and post comments; engagement counts are visible on each video page. | Optional |
| Recommendations | The home page and sidebar suggest videos based on watch history, preferences, and trending content. | Optional |
Non-Functional Requirements#
These describe how well the system works.
| Property | Requirement | Why It Matters |
|---|---|---|
| Smooth playback | Adaptive bitrate streaming — video quality adjusts automatically to match the viewer's network speed without manual intervention | Buffering is the #1 reason users abandon a video; ABR eliminates buffering by downgrading quality instead of pausing |
| Scale | Support 1 billion video views per day across 100 million daily active users | YouTube serves over 1 billion hours of video daily; the architecture must handle this without degradation |
| Low startup latency | First frame appears within 2 seconds of pressing play, even on mobile networks | Research shows 53% of mobile users abandon a page that takes over 3 seconds to load — video startup is even more sensitive |
| Global availability | 99.99% uptime; videos playable from any country with minimal latency | A global user base means traffic follows the sun — peak hours rotate through time zones continuously |
| Durability | Uploaded videos must never be lost — zero data loss tolerance | Users invest significant effort creating content; losing a single video destroys trust permanently |
| Upload processing | Uploaded videos are available for viewing within minutes, not hours | Creators expect near-real-time feedback; a multi-hour processing delay discourages uploads |
The central design tension: the upload path and the viewing path have fundamentally different characteristics. Uploads are infrequent (5 million/day), CPU-intensive (transcoding), and latency-tolerant (minutes are acceptable). Views are extremely frequent (1 billion/day), bandwidth-intensive, and latency-sensitive (sub-second startup required). The architecture must separate these two paths completely — the upload pipeline must never compete with the streaming pipeline for resources.
Step 2: Back-of-the-Envelope Estimation#
| Metric | Calculation | Result |
|---|---|---|
| Daily Active Users (DAU) | Assumed (YouTube-scale) | 100 million |
| Video uploads per day | ~5 million new videos uploaded daily | 5 million/day |
| Video views per day | 100M DAU × 10 videos/user/day on average | 1 billion views/day |
| Average video duration | Assumed | 10 minutes |
| Storage per uploaded video (raw) | 1 raw file, ~500 MB average (before transcoding) | ~500 MB |
| Storage per video after transcoding | H.264 at 5 resolutions + H.265 at 3 higher resolutions = 8 variants + audio; ~1.5 GB total | ~1.5 GB |
| Daily storage growth | 5M videos × 1.5 GB each | ~7.5 PB/day |
| Average concurrent streams | Little's Law: 1B views/day ÷ 86,400 sec/day × 300 sec avg watch time (not all viewers finish the full 10 min) | ~3.5 million concurrent |
| Peak concurrent streams | 3.5M × 1.5 (peak-hour concentration) | ~5 million concurrent |
| Bandwidth per stream (average) | 720p at ~3 Mbps average across all quality levels | ~3 Mbps |
| Peak egress bandwidth | 5M concurrent × 3 Mbps | ~15 Tbps |
| CDN cache hit ratio target | Popular videos cached at edge; long-tail fetched from origin | 95%+ hit ratio |
The key insight from the math: 15 Tbps of peak egress bandwidth cannot be served from a single data center or even a handful of data centers. This single number is why CDN is the #1 architectural decision for video streaming — it is not an optimization, it is a fundamental requirement. Without edge caching, the origin servers would need to deliver every byte of every video directly, which is physically and economically impossible at this scale. The CDN serves 95%+ of all video bytes; the origin handles only the first request for each video segment at each edge location.
The storage math is equally striking: 7.5 PB per day means the system accumulates roughly 2.7 exabytes per year. Object storage (like S3 or GCS) with tiered storage classes — frequently accessed videos on hot storage, older videos on cold/archive storage — is the only economically viable approach.
Step 3: High-Level Design#
A video streaming system has two fundamentally separate paths that must be designed independently:
- Upload/Processing path: Creator uploads video → API servers validate → message queue → transcoding workers produce multiple resolutions → segments stored in object storage → metadata updated (slow, CPU-intensive, latency-tolerant)
- Viewing/Streaming path: Viewer requests video → CDN serves cached segments → origin fetches on cache miss → client uses adaptive bitrate to pick quality (fast, bandwidth-intensive, latency-critical)
What each component does:
- Upload Service — Accepts video uploads via a resumable upload protocol (so large files survive network interruptions). Stores the raw file in a temporary S3 staging bucket and publishes a
video_uploadedevent to Kafka. Returns202 Acceptedimmediately — transcoding happens asynchronously in the background. - Kafka — Holds
video_uploadedevents until transcoding workers process them. Decouples the fast upload acceptance from the slow, CPU-intensive transcoding work. If a worker crashes mid-transcode, Kafka re-delivers the event to another worker. - Transcoding Workers — A horizontally-scaled consumer group that reads upload events from Kafka. For each video, they extract audio, split the video into segments, transcode each segment into multiple resolutions and codecs in parallel, and write the output segments to origin object storage. Once complete, they update the video's status in the Metadata Service to "ready."
- Origin Object Storage (S3/GCS) — The durable, permanent store for all transcoded video segments. Organized by video ID, resolution, and segment number. The CDN pulls segments from here on cache miss.
- CDN Edge Nodes — Geographically distributed cache servers that store and serve video segments close to viewers. A popular video's segments are cached at edge nodes worldwide; a long-tail video is fetched from origin on the first request at each edge location and then cached.
- Metadata Service — Manages video metadata (title, description, status, thumbnail URL, available resolutions). Backed by PostgreSQL with a Redis cache for fast reads. The streaming client fetches the HLS manifest URL from this service before playback begins.
- Search Service — Powers video search using Elasticsearch for full-text indexing on titles, descriptions, and tags.
API Design#
| Endpoint | Method | Request / Response | Notes |
|---|---|---|---|
POST /api/v1/videos | POST | Body: { title, description, tags[] } → 201 { video_id, upload_url } | Creates video metadata record with status 'pending'; returns a pre-signed upload URL for the client to upload the file directly to S3 |
PUT /api/v1/videos/{id}/upload | PUT | Binary video file → 202 Accepted { status: 'processing' } | Resumable upload to S3 staging bucket; triggers transcoding pipeline via Kafka event |
GET /api/v1/videos/{id} | GET | → 200 { video_id, title, manifest_url, status, resolutions[] } | Returns metadata including the HLS manifest URL; client uses manifest to begin adaptive streaming |
GET /api/v1/videos/{id}/stream | GET | → 200 HLS manifest (.m3u8) | Returns the master manifest listing all available quality levels and their segment playlists; actual video bytes are served from CDN |
GET /api/v1/search?q=term | GET | → 200 { results[], total, next_cursor } | Full-text search on title, description, tags via Elasticsearch; cursor-based pagination |
POST /api/v1/videos/{id}/like | POST | → 200 { like_count } | Atomic increment of denormalized like count; composite primary key prevents duplicate likes |
Why a pre-signed upload URL? Video files are large (hundreds of MBs to several GBs). Routing them through API servers would consume massive bandwidth and memory on servers that also handle metadata requests. Instead, the API server generates a pre-signed S3 URL — a temporary, authenticated URL that lets the client upload directly to object storage, bypassing the API server entirely. The API server stays lightweight and handles only metadata operations.
Why resumable uploads? A 2 GB video upload on a mobile network can easily be interrupted by a dropped connection. Without resumable uploads, the user must restart from scratch. Resumable upload protocols (like the tus protocol or S3 multipart upload) split the file into chunks and track which chunks have been received, so the upload can resume from where it left off after an interruption.
Step 4: Deep Dive — Video Transcoding Pipeline#
When a creator uploads a video, the raw file is just the beginning. The system must convert that single file into dozens of versions — different resolutions, different codecs, different segment durations — so that every device on every network can play it smoothly.
Think of transcoding like translating a book into multiple languages and formats simultaneously: paperback, audiobook, large-print edition, and e-book. The original manuscript is written once; the publisher produces all the variants needed for every possible reader.
Why transcode at all? A 4K video looks great on a large monitor with fiber internet, but it would be completely unwatchable on a phone with a 3G connection — the data would arrive slower than the video plays, causing constant buffering. By producing the same video at multiple resolutions (360p through 4K), the system lets each viewer receive exactly the quality their device and network can handle. The client automatically switches between quality levels as conditions change — this is adaptive bitrate streaming, covered in Step 5.
The DAG pipeline: Transcoding is structured as a directed acyclic graph (DAG) of tasks — a set of processing steps where some depend on others and some can run in parallel. Audio extraction runs independently of video segmentation. Each segment can be transcoded to all resolutions in parallel across multiple workers. This parallelism is critical: a 10-minute video split into 150 four-second segments can be transcoded by 150 workers simultaneously, completing in minutes instead of hours.
Video Transcoding: Codec and Resolution Strategy
The transcoding pipeline must decide which codecs and resolutions to produce. More variants mean better viewer experience but exponentially more storage and compute cost. Production systems choose a practical subset based on device analytics and cost constraints.
Storage organization: Transcoded segments are stored in object storage with a predictable path structure: videos/{video_id}/{codec}/{resolution}/segment_{number}.ts. This structure makes it trivial for the CDN to construct the URL for any segment of any video at any quality level. The HLS manifest file references these paths, and the client requests them one at a time during playback.
Failure handling: If a transcoding worker crashes mid-processing, Kafka re-delivers the video_uploaded event to another worker. The replacement worker checks which segments have already been written to object storage and resumes from where the previous worker left off. Object storage writes are idempotent — re-writing the same segment with the same content has no side effect.
Step 5: Deep Dive — Adaptive Bitrate Streaming (ABR)#
Adaptive bitrate streaming is the technology that makes video playback smooth across wildly varying network conditions. Without it, a viewer would have to manually choose "720p" or "1080p" before pressing play — and if their bandwidth dropped mid-video, playback would freeze.
Think of ABR like a highway with multiple lanes. When traffic is flowing freely, you drive in the fast lane (high quality). When congestion hits, you seamlessly merge into a slower lane (lower quality) — the car keeps moving, just at a reduced speed. The key insight: a brief quality reduction is always preferable to a complete stop (buffering).
How ABR works, step by step:
- The video is split into small segments (typically 2-10 seconds each) during transcoding. Each segment is independently encoded at every quality level.
- A master manifest file (
.m3u8for HLS,.mpdfor DASH) lists all available quality levels and their bitrates. - When the player starts, it downloads the manifest and selects an initial quality — usually a conservative one to minimize startup time.
- After downloading each segment, the player measures how long the download took and estimates available bandwidth.
- For the next segment, the player picks the highest quality level whose bitrate is below the estimated bandwidth.
- If bandwidth drops, the player switches to a lower quality for the next segment. If bandwidth improves, it switches up. This happens seamlessly — the viewer sees a brief quality change but never a pause.
HLS vs. DASH: Streaming Protocol Comparison
HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP) are the two dominant adaptive streaming protocols. Both work on the same principle — chunked video segments served over plain HTTP — but differ in origin, format, and ecosystem support.
The manifest file in detail: An HLS master manifest is a plain-text file that lists all available quality levels with their bitrates. Here is a simplified example:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2400000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
Each quality-level playlist then lists the individual segment files:
#EXTM3U
#EXT-X-TARGETDURATION:4
#EXTINF:4.0,
segment_001.ts
#EXTINF:4.0,
segment_002.ts
The video player reads the master manifest, measures its bandwidth, selects the appropriate quality playlist, and starts fetching segments. Everything is plain HTTP — no special streaming protocol needed. This is why video streaming works through any CDN, proxy, or firewall that supports HTTP.
Step 6: Deep Dive — CDN and Global Distribution#
The CDN is the most important architectural decision in the entire system. At 15 Tbps of peak egress bandwidth, no origin infrastructure can serve video bytes directly to all viewers. The CDN absorbs 95%+ of all traffic, making the origin's job manageable.
Think of a CDN like a franchise model for a library. The central library (origin) holds every book ever published. Branch libraries (edge nodes) in every neighborhood stock copies of the most popular books. When a reader wants a bestseller, they get it instantly from their local branch. When they want a rare title, the branch requests it from the central library, reads it, and keeps a copy for the next reader who asks.
The tiered caching hierarchy: Production CDNs use multiple cache tiers to minimize origin fetches. Edge nodes are closest to viewers (lowest latency, smallest cache). Mid-tier nodes sit between edge and origin (larger cache, fewer locations). A request that misses the edge cache checks the mid-tier before going all the way to origin. This reduces origin traffic by an additional 30-50% compared to a single-tier CDN.
CDN Caching Strategy: Push vs. Pull
CDN caching follows two models: pull-through (edge fetches on first request, then caches) and push/pre-position (popular content is proactively placed at edge before any viewer requests it). Production systems use both — push for predictable demand, pull for everything else.
Netflix Open Connect: Netflix takes CDN to its logical extreme by placing dedicated hardware appliances (called Open Connect Appliances, or OCAs) inside ISP networks worldwide. During off-peak hours (typically 2am-8am local time), Netflix pre-fills these appliances with content predicted to be popular the next day. When a subscriber presses play, the video is served from a box sitting inside their own ISP's network — often just one network hop away. This eliminates transit costs and delivers the lowest possible latency. YouTube operates a similar system called Google Global Cache (GGC).
CDN cache key design: Each cached object needs a unique key. For video segments, the cache key is typically: /{video_id}/{codec}/{resolution}/segment_{number}.ts. This means the same segment at different quality levels is cached independently — the CDN does not need to understand video formats, it just caches HTTP responses keyed by URL path.
Step 7: Deep Dive — Video Metadata and Search#
While video bytes flow through CDN and object storage, everything else about a video — its title, description, tags, view count, comments, and the creator who uploaded it — lives in a traditional database. This metadata system must support fast reads (every video page load), reliable writes (view count increments at 1B/day), and full-text search.
View counting at scale: At 1 billion views per day (~11,500 view increments per second), updating the database directly for every view would create massive write contention on popular videos. Instead, view counts are accumulated in Redis using atomic INCR operations and flushed to PostgreSQL in batches every 30 seconds. This means the view count displayed to users may lag reality by up to 30 seconds — an acceptable trade-off for eliminating database contention. YouTube itself shows view counts that can lag by hours; exact real-time accuracy is not required.
Video Metadata Schema and Search
The metadata layer must serve two access patterns: point lookups by video ID (for video pages) and full-text search across titles and descriptions (for the search bar). PostgreSQL handles the first pattern; Elasticsearch handles the second. Keeping both in sync requires a change data capture (CDC) pipeline.
Database schema (simplified):
CREATE TABLE videos (
video_id BIGINT PRIMARY KEY,
uploader_id BIGINT NOT NULL REFERENCES users(id),
title VARCHAR(500) NOT NULL,
description TEXT,
tags TEXT[],
status VARCHAR(20) NOT NULL DEFAULT 'pending', -- pending, processing, ready, failed
duration_sec INT,
manifest_url TEXT, -- HLS manifest path in object storage
thumbnail_url TEXT,
view_count BIGINT NOT NULL DEFAULT 0,
like_count BIGINT NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL,
updated_at TIMESTAMPTZ NOT NULL
);
CREATE INDEX ON videos (uploader_id, created_at DESC);
CREATE INDEX ON videos (status) WHERE status != 'ready'; -- partial index for processing pipeline
Step 8: Trade-offs and Production Reality#
Every architectural decision in this system involves a deliberate trade-off. The table below summarizes what each choice optimizes and what it gives up.
| Design Decision | What It Optimizes | What It Sacrifices |
|---|---|---|
| Separate upload and viewing paths | Upload processing (CPU-intensive) never competes with streaming (bandwidth-intensive) for resources | Two independent systems to build, deploy, and monitor — more operational complexity |
| CDN-first architecture | 95%+ of video bytes served from edge, sub-100ms latency globally, origin load reduced by 20x | CDN costs are the largest line item in the budget; cache invalidation for updated/deleted content requires coordination |
| Multi-codec transcoding (H.264 + H.265 + AV1) | Every device gets the best quality it can decode; newer codecs save 30-50% bandwidth | Storage multiplied per codec; transcoding compute cost increases linearly with each additional codec |
| 4-second HLS segments | Good balance of startup latency (~4s to first segment) and compression efficiency | Slightly higher startup latency than 2-second segments; quality switches take 4 seconds to take effect |
| Adaptive bitrate streaming | Smooth playback across all network conditions — quality degrades gracefully instead of buffering | Quality fluctuations are visible to attentive viewers; the player's bandwidth estimation adds complexity |
| Elasticsearch for search | Millisecond full-text search with relevance ranking across billions of videos | Eventually consistent with PostgreSQL; CDC pipeline is another failure point to monitor |
| Redis for view counts | Absorbs 11,500+ increments/second without database contention | View counts lag reality by up to 30 seconds; Redis failure could temporarily lose in-flight counts |
| Pre-signed upload URLs | Large file uploads bypass API servers entirely, keeping them lightweight for metadata | Clients must handle the two-step flow (get URL, then upload); upload progress tracking is more complex |
What this design intentionally excludes#
Production video streaming platforms include additional layers that are out of scope for this case study:
- DRM (Digital Rights Management) — encryption of video segments with license key servers (Widevine, FairPlay, PlayReady) to prevent unauthorized downloading and redistribution
- Live streaming — real-time ingest, transcoding, and distribution with sub-10-second latency requirements (a fundamentally different architecture from video-on-demand)
- Monetization and ad insertion — server-side ad insertion (SSAI) that stitches ad segments into the video stream, requiring manifest manipulation per viewer
- Content moderation — automated detection of policy-violating content using ML models during or after upload processing
- Offline download — encrypted local storage of video segments for offline playback with expiry and license management
Each of these adds significant complexity but does not fundamentally change the core upload-transcode-CDN-stream architecture described here.
When Would You Design This Differently?#
| Scenario | Simpler Approach | Upgrade Trigger |
|---|---|---|
| Startup: under 10K DAU | Skip transcoding — serve the original upload directly. Use a single CDN. No adaptive bitrate needed. | When users on slow connections report buffering, or when CDN egress costs exceed the cost of transcoding |
| Short-form video (TikTok-like) | Transcode to fewer resolutions (360p, 720p, 1080p only). Use 2-second segments for faster startup. Aggressive CDN pre-positioning since content is small. | When you expand to long-form content where 4K and segment-level quality switching matter |
| Internal corporate video | Single resolution, single codec. No CDN — serve from origin within the corporate network. | When employees access videos from remote offices or mobile devices with varying bandwidth |
| Educational platform (small catalog) | Pre-transcode everything at upload time. No need for a transcoding queue — synchronous processing is fine for minutes-long waits. | When upload volume exceeds what synchronous processing can handle, or when processing blocks the upload API |
Summary#
| Component | Design Decision | Key Reasoning |
|---|---|---|
| Upload pipeline | Resumable upload → S3 staging → Kafka → transcoding workers → object storage | Decouples upload acceptance (fast) from transcoding (slow, CPU-intensive); Kafka ensures no upload is lost if a worker crashes |
| Transcoding | DAG pipeline producing H.264 (all resolutions) + H.265/AV1 (high resolutions) | Parallel segment transcoding reduces processing time from hours to minutes; multi-codec supports all devices while minimizing bandwidth |
| Streaming protocol | HLS with 4-second segments and adaptive bitrate | Universal device support; ABR ensures smooth playback by switching quality based on real-time bandwidth measurement |
| CDN | Multi-tier CDN serving 95%+ of video bytes; pull-through for most content, pre-positioning for trending videos | 15 Tbps peak bandwidth is physically unservable from origin; CDN is the foundational requirement, not an optimization |
| Metadata | PostgreSQL + Redis cache for video metadata; Elasticsearch for search; Redis INCR for view counts | ACID guarantees for creator content; sub-millisecond reads via cache; full-text search via purpose-built engine |
| Storage | S3/GCS object storage with tiered storage classes | Only economically viable option for exabyte-scale video storage; hot tier for recent/popular content, cold tier for archive |
| API design | Pre-signed upload URLs; separate endpoints for metadata vs. streaming | Large uploads bypass API servers; viewing path is CDN-first with API serving only manifests and metadata |
| Search | Elasticsearch synced via CDC from PostgreSQL | Avoids dual-write inconsistency; Elasticsearch is rebuildable from the PostgreSQL source of truth at any time |
The video streaming system is, at its core, a content preparation and delivery pipeline. The upload path converts a single raw file into dozens of optimized variants. The viewing path serves those variants from locations as close to the viewer as possible. The CDN is the architectural keystone — everything else in the system exists to feed content into the CDN efficiently and serve metadata alongside it. Master the separation of upload processing from viewing delivery, and the rest of the design follows naturally.
Sources: