Video Streaming

You press play on a YouTube video and it starts within a second — even on a slow connection. Behind that instant startup is one of the most infrastructure-intensive problems in distributed systems: delivering high-definition video to billions of devices with different screen sizes, network speeds, and hardware capabilities, all without buffering.

Think of it like a restaurant chain that serves 1 billion meals a day. The kitchen (your origin servers) cannot cook every meal to order in real time. Instead, it pre-cooks meals in multiple portion sizes, stores them in local branches (CDN edge nodes) close to customers, and lets each customer pick the portion that fits their appetite (adaptive bitrate). The kitchen's job is preparation; the branches handle delivery.

This case study answers: how does a single uploaded video reach billions of viewers worldwide without melting the infrastructure?

This case study follows the same framework:

Clarify constraints (Steps 1–2) — What does the system do, and how much traffic must it handle?
High-level design (Step 3) — What are the major components and how do they connect?
Deep dives (Steps 4–7) — How do the trickiest parts actually work?
Trade-offs (Step 8) — What did we give up, and when would we choose differently?

Step 1: Clarify Requirements#

Before designing anything, pin down exactly what the system needs to do and how well it needs to do it.

Functional Requirements#

These describe what the system does.

Feature	Description	Priority
Upload videos	Users can upload video files of any common format (MP4, MOV, AVI, MKV). The system validates, processes, and stores them durably.	Core
Stream/watch videos	Users can watch videos with instant playback. The video adapts quality in real time based on available bandwidth.	Core
Search videos	Users can search by title, description, and tags. Results are ranked by relevance.	Core
Video metadata	Each video has a title, description, thumbnail, upload date, view count, and duration. Metadata is editable by the uploader.	Core
Like and comment	Users can like videos and post comments; engagement counts are visible on each video page.	Optional
Recommendations	The home page and sidebar suggest videos based on watch history, preferences, and trending content.	Optional

Non-Functional Requirements#

These describe how well the system works.

Property	Requirement	Why It Matters
Smooth playback	Adaptive bitrate streaming — video quality adjusts automatically to match the viewer's network speed without manual intervention	Buffering is the #1 reason users abandon a video; ABR eliminates buffering by downgrading quality instead of pausing
Scale	Support 1 billion video views per day across 100 million daily active users	YouTube serves over 1 billion hours of video daily; the architecture must handle this without degradation
Low startup latency	First frame appears within 2 seconds of pressing play, even on mobile networks	Research shows 53% of mobile users abandon a page that takes over 3 seconds to load — video startup is even more sensitive
Global availability	99.99% uptime; videos playable from any country with minimal latency	A global user base means traffic follows the sun — peak hours rotate through time zones continuously
Durability	Uploaded videos must never be lost — zero data loss tolerance	Users invest significant effort creating content; losing a single video destroys trust permanently
Upload processing	Uploaded videos are available for viewing within minutes, not hours	Creators expect near-real-time feedback; a multi-hour processing delay discourages uploads

The central design tension: the upload path and the viewing path have fundamentally different characteristics. Uploads are infrequent (5 million/day), CPU-intensive (transcoding), and latency-tolerant (minutes are acceptable). Views are extremely frequent (1 billion/day), bandwidth-intensive, and latency-sensitive (sub-second startup required). The architecture must separate these two paths completely — the upload pipeline must never compete with the streaming pipeline for resources.

Step 2: Back-of-the-Envelope Estimation#

Metric	Calculation	Result
Daily Active Users (DAU)	Assumed (YouTube-scale)	100 million
Video uploads per day	~5 million new videos uploaded daily	5 million/day
Video views per day	100M DAU × 10 videos/user/day on average	1 billion views/day
Average video duration	Assumed	10 minutes
Storage per uploaded video (raw)	1 raw file, ~500 MB average (before transcoding)	~500 MB
Storage per video after transcoding	H.264 at 5 resolutions + H.265 at 3 higher resolutions = 8 variants + audio; ~1.5 GB total	~1.5 GB
Daily storage growth	5M videos × 1.5 GB each	~7.5 PB/day
Average concurrent streams	Little's Law: 1B views/day ÷ 86,400 sec/day × 300 sec avg watch time (not all viewers finish the full 10 min)	~3.5 million concurrent
Peak concurrent streams	3.5M × 1.5 (peak-hour concentration)	~5 million concurrent
Bandwidth per stream (average)	720p at ~3 Mbps average across all quality levels	~3 Mbps
Peak egress bandwidth	5M concurrent × 3 Mbps	~15 Tbps
CDN cache hit ratio target	Popular videos cached at edge; long-tail fetched from origin	95%+ hit ratio

The key insight from the math: 15 Tbps of peak egress bandwidth cannot be served from a single data center or even a handful of data centers. This single number is why CDN is the #1 architectural decision for video streaming — it is not an optimization, it is a fundamental requirement. Without edge caching, the origin servers would need to deliver every byte of every video directly, which is physically and economically impossible at this scale. The CDN serves 95%+ of all video bytes; the origin handles only the first request for each video segment at each edge location.

The storage math is equally striking: 7.5 PB per day means the system accumulates roughly 2.7 exabytes per year. Object storage (like S3 or GCS) with tiered storage classes — frequently accessed videos on hot storage, older videos on cold/archive storage — is the only economically viable approach.

Step 3: High-Level Design#

A video streaming system has two fundamentally separate paths that must be designed independently:

Upload/Processing path: Creator uploads video → API servers validate → message queue → transcoding workers produce multiple resolutions → segments stored in object storage → metadata updated (slow, CPU-intensive, latency-tolerant)
Viewing/Streaming path: Viewer requests video → CDN serves cached segments → origin fetches on cache miss → client uses adaptive bitrate to pick quality (fast, bandwidth-intensive, latency-critical)

Rendering diagram...

What each component does:

Upload Service — Accepts video uploads via a resumable upload protocol (so large files survive network interruptions). Stores the raw file in a temporary S3 staging bucket and publishes a video_uploaded event to Kafka. Returns 202 Accepted immediately — transcoding happens asynchronously in the background.
Kafka — Holds video_uploaded events until transcoding workers process them. Decouples the fast upload acceptance from the slow, CPU-intensive transcoding work. If a worker crashes mid-transcode, Kafka re-delivers the event to another worker.
Transcoding Workers — A horizontally-scaled consumer group that reads upload events from Kafka. For each video, they extract audio, split the video into segments, transcode each segment into multiple resolutions and codecs in parallel, and write the output segments to origin object storage. Once complete, they update the video's status in the Metadata Service to "ready."
Origin Object Storage (S3/GCS) — The durable, permanent store for all transcoded video segments. Organized by video ID, resolution, and segment number. The CDN pulls segments from here on cache miss.
CDN Edge Nodes — Geographically distributed cache servers that store and serve video segments close to viewers. A popular video's segments are cached at edge nodes worldwide; a long-tail video is fetched from origin on the first request at each edge location and then cached.
Metadata Service — Manages video metadata (title, description, status, thumbnail URL, available resolutions). Backed by PostgreSQL with a Redis cache for fast reads. The streaming client fetches the HLS manifest URL from this service before playback begins.
Search Service — Powers video search using Elasticsearch for full-text indexing on titles, descriptions, and tags.

API Design#

Endpoint	Method	Request / Response	Notes
`POST /api/v1/videos`	POST	Body: `{ title, description, tags[] }` → `201 { video_id, upload_url }`	Creates video metadata record with status 'pending'; returns a pre-signed upload URL for the client to upload the file directly to S3
`PUT /api/v1/videos/{id}/upload`	PUT	Binary video file → `202 Accepted { status: 'processing' }`	Resumable upload to S3 staging bucket; triggers transcoding pipeline via Kafka event
`GET /api/v1/videos/{id}`	GET	→ `200 { video_id, title, manifest_url, status, resolutions[] }`	Returns metadata including the HLS manifest URL; client uses manifest to begin adaptive streaming
`GET /api/v1/videos/{id}/stream`	GET	→ `200` HLS manifest (.m3u8)	Returns the master manifest listing all available quality levels and their segment playlists; actual video bytes are served from CDN
`GET /api/v1/search?q=term`	GET	→ `200 { results[], total, next_cursor }`	Full-text search on title, description, tags via Elasticsearch; cursor-based pagination
`POST /api/v1/videos/{id}/like`	POST	→ `200 { like_count }`	Atomic increment of denormalized like count; composite primary key prevents duplicate likes

Why a pre-signed upload URL? Video files are large (hundreds of MBs to several GBs). Routing them through API servers would consume massive bandwidth and memory on servers that also handle metadata requests. Instead, the API server generates a pre-signed S3 URL — a temporary, authenticated URL that lets the client upload directly to object storage, bypassing the API server entirely. The API server stays lightweight and handles only metadata operations.

Why resumable uploads? A 2 GB video upload on a mobile network can easily be interrupted by a dropped connection. Without resumable uploads, the user must restart from scratch. Resumable upload protocols (like the tus protocol or S3 multipart upload) split the file into chunks and track which chunks have been received, so the upload can resume from where it left off after an interruption.

Step 4: Deep Dive — Video Transcoding Pipeline#

When a creator uploads a video, the raw file is just the beginning. The system must convert that single file into dozens of versions — different resolutions, different codecs, different segment durations — so that every device on every network can play it smoothly.

Think of transcoding like translating a book into multiple languages and formats simultaneously: paperback, audiobook, large-print edition, and e-book. The original manuscript is written once; the publisher produces all the variants needed for every possible reader.

Rendering diagram...

Why transcode at all? A 4K video looks great on a large monitor with fiber internet, but it would be completely unwatchable on a phone with a 3G connection — the data would arrive slower than the video plays, causing constant buffering. By producing the same video at multiple resolutions (360p through 4K), the system lets each viewer receive exactly the quality their device and network can handle. The client automatically switches between quality levels as conditions change — this is adaptive bitrate streaming, covered in Step 5.

The DAG pipeline: Transcoding is structured as a directed acyclic graph (DAG) of tasks — a set of processing steps where some depend on others and some can run in parallel. Audio extraction runs independently of video segmentation. Each segment can be transcoded to all resolutions in parallel across multiple workers. This parallelism is critical: a 10-minute video split into 150 four-second segments can be transcoded by 150 workers simultaneously, completing in minutes instead of hours.

Video Transcoding: Codec and Resolution Strategy

The transcoding pipeline must decide which codecs and resolutions to produce. More variants mean better viewer experience but exponentially more storage and compute cost. Production systems choose a practical subset based on device analytics and cost constraints.

Rendering diagram...

Storage organization: Transcoded segments are stored in object storage with a predictable path structure: videos/{video_id}/{codec}/{resolution}/segment_{number}.ts. This structure makes it trivial for the CDN to construct the URL for any segment of any video at any quality level. The HLS manifest file references these paths, and the client requests them one at a time during playback.

Failure handling: If a transcoding worker crashes mid-processing, Kafka re-delivers the video_uploaded event to another worker. The replacement worker checks which segments have already been written to object storage and resumes from where the previous worker left off. Object storage writes are idempotent — re-writing the same segment with the same content has no side effect.

Step 5: Deep Dive — Adaptive Bitrate Streaming (ABR)#

Adaptive bitrate streaming is the technology that makes video playback smooth across wildly varying network conditions. Without it, a viewer would have to manually choose "720p" or "1080p" before pressing play — and if their bandwidth dropped mid-video, playback would freeze.

Think of ABR like a highway with multiple lanes. When traffic is flowing freely, you drive in the fast lane (high quality). When congestion hits, you seamlessly merge into a slower lane (lower quality) — the car keeps moving, just at a reduced speed. The key insight: a brief quality reduction is always preferable to a complete stop (buffering).

Rendering diagram...

How ABR works, step by step:

The video is split into small segments (typically 2-10 seconds each) during transcoding. Each segment is independently encoded at every quality level.
A master manifest file (.m3u8 for HLS, .mpd for DASH) lists all available quality levels and their bitrates.
When the player starts, it downloads the manifest and selects an initial quality — usually a conservative one to minimize startup time.
After downloading each segment, the player measures how long the download took and estimates available bandwidth.
For the next segment, the player picks the highest quality level whose bitrate is below the estimated bandwidth.
If bandwidth drops, the player switches to a lower quality for the next segment. If bandwidth improves, it switches up. This happens seamlessly — the viewer sees a brief quality change but never a pause.

HLS vs. DASH: Streaming Protocol Comparison

HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP) are the two dominant adaptive streaming protocols. Both work on the same principle — chunked video segments served over plain HTTP — but differ in origin, format, and ecosystem support.

Rendering diagram...

The manifest file in detail: An HLS master manifest is a plain-text file that lists all available quality levels with their bitrates. Here is a simplified example:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2400000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8

Each quality-level playlist then lists the individual segment files:

#EXTM3U
#EXT-X-TARGETDURATION:4
#EXTINF:4.0,
segment_001.ts
#EXTINF:4.0,
segment_002.ts

The video player reads the master manifest, measures its bandwidth, selects the appropriate quality playlist, and starts fetching segments. Everything is plain HTTP — no special streaming protocol needed. This is why video streaming works through any CDN, proxy, or firewall that supports HTTP.

Step 6: Deep Dive — CDN and Global Distribution#

The CDN is the most important architectural decision in the entire system. At 15 Tbps of peak egress bandwidth, no origin infrastructure can serve video bytes directly to all viewers. The CDN absorbs 95%+ of all traffic, making the origin's job manageable.

Think of a CDN like a franchise model for a library. The central library (origin) holds every book ever published. Branch libraries (edge nodes) in every neighborhood stock copies of the most popular books. When a reader wants a bestseller, they get it instantly from their local branch. When they want a rare title, the branch requests it from the central library, reads it, and keeps a copy for the next reader who asks.

Rendering diagram...

The tiered caching hierarchy: Production CDNs use multiple cache tiers to minimize origin fetches. Edge nodes are closest to viewers (lowest latency, smallest cache). Mid-tier nodes sit between edge and origin (larger cache, fewer locations). A request that misses the edge cache checks the mid-tier before going all the way to origin. This reduces origin traffic by an additional 30-50% compared to a single-tier CDN.

CDN Caching Strategy: Push vs. Pull

CDN caching follows two models: pull-through (edge fetches on first request, then caches) and push/pre-position (popular content is proactively placed at edge before any viewer requests it). Production systems use both — push for predictable demand, pull for everything else.

Rendering diagram...

Netflix Open Connect: Netflix takes CDN to its logical extreme by placing dedicated hardware appliances (called Open Connect Appliances, or OCAs) inside ISP networks worldwide. During off-peak hours (typically 2am-8am local time), Netflix pre-fills these appliances with content predicted to be popular the next day. When a subscriber presses play, the video is served from a box sitting inside their own ISP's network — often just one network hop away. This eliminates transit costs and delivers the lowest possible latency. YouTube operates a similar system called Google Global Cache (GGC).

CDN cache key design: Each cached object needs a unique key. For video segments, the cache key is typically: /{video_id}/{codec}/{resolution}/segment_{number}.ts. This means the same segment at different quality levels is cached independently — the CDN does not need to understand video formats, it just caches HTTP responses keyed by URL path.

Step 7: Deep Dive — Video Metadata and Search#

While video bytes flow through CDN and object storage, everything else about a video — its title, description, tags, view count, comments, and the creator who uploaded it — lives in a traditional database. This metadata system must support fast reads (every video page load), reliable writes (view count increments at 1B/day), and full-text search.

Rendering diagram...

View counting at scale: At 1 billion views per day (~11,500 view increments per second), updating the database directly for every view would create massive write contention on popular videos. Instead, view counts are accumulated in Redis using atomic INCR operations and flushed to PostgreSQL in batches every 30 seconds. This means the view count displayed to users may lag reality by up to 30 seconds — an acceptable trade-off for eliminating database contention. YouTube itself shows view counts that can lag by hours; exact real-time accuracy is not required.

Video Metadata Schema and Search

The metadata layer must serve two access patterns: point lookups by video ID (for video pages) and full-text search across titles and descriptions (for the search bar). PostgreSQL handles the first pattern; Elasticsearch handles the second. Keeping both in sync requires a change data capture (CDC) pipeline.

Rendering diagram...

Database schema (simplified):

CREATE TABLE videos (
  video_id      BIGINT PRIMARY KEY,
  uploader_id   BIGINT NOT NULL REFERENCES users(id),
  title         VARCHAR(500) NOT NULL,
  description   TEXT,
  tags          TEXT[],
  status        VARCHAR(20) NOT NULL DEFAULT 'pending', -- pending, processing, ready, failed
  duration_sec  INT,
  manifest_url  TEXT,                    -- HLS manifest path in object storage
  thumbnail_url TEXT,
  view_count    BIGINT NOT NULL DEFAULT 0,
  like_count    BIGINT NOT NULL DEFAULT 0,
  created_at    TIMESTAMPTZ NOT NULL,
  updated_at    TIMESTAMPTZ NOT NULL
);

CREATE INDEX ON videos (uploader_id, created_at DESC);
CREATE INDEX ON videos (status) WHERE status != 'ready';  -- partial index for processing pipeline

Step 8: Trade-offs and Production Reality#

Every architectural decision in this system involves a deliberate trade-off. The table below summarizes what each choice optimizes and what it gives up.

Design Decision	What It Optimizes	What It Sacrifices
Separate upload and viewing paths	Upload processing (CPU-intensive) never competes with streaming (bandwidth-intensive) for resources	Two independent systems to build, deploy, and monitor — more operational complexity
CDN-first architecture	95%+ of video bytes served from edge, sub-100ms latency globally, origin load reduced by 20x	CDN costs are the largest line item in the budget; cache invalidation for updated/deleted content requires coordination
Multi-codec transcoding (H.264 + H.265 + AV1)	Every device gets the best quality it can decode; newer codecs save 30-50% bandwidth	Storage multiplied per codec; transcoding compute cost increases linearly with each additional codec
4-second HLS segments	Good balance of startup latency (~4s to first segment) and compression efficiency	Slightly higher startup latency than 2-second segments; quality switches take 4 seconds to take effect
Adaptive bitrate streaming	Smooth playback across all network conditions — quality degrades gracefully instead of buffering	Quality fluctuations are visible to attentive viewers; the player's bandwidth estimation adds complexity
Elasticsearch for search	Millisecond full-text search with relevance ranking across billions of videos	Eventually consistent with PostgreSQL; CDC pipeline is another failure point to monitor
Redis for view counts	Absorbs 11,500+ increments/second without database contention	View counts lag reality by up to 30 seconds; Redis failure could temporarily lose in-flight counts
Pre-signed upload URLs	Large file uploads bypass API servers entirely, keeping them lightweight for metadata	Clients must handle the two-step flow (get URL, then upload); upload progress tracking is more complex

What this design intentionally excludes#

Production video streaming platforms include additional layers that are out of scope for this case study:

DRM (Digital Rights Management) — encryption of video segments with license key servers (Widevine, FairPlay, PlayReady) to prevent unauthorized downloading and redistribution
Live streaming — real-time ingest, transcoding, and distribution with sub-10-second latency requirements (a fundamentally different architecture from video-on-demand)
Monetization and ad insertion — server-side ad insertion (SSAI) that stitches ad segments into the video stream, requiring manifest manipulation per viewer
Content moderation — automated detection of policy-violating content using ML models during or after upload processing
Offline download — encrypted local storage of video segments for offline playback with expiry and license management

Each of these adds significant complexity but does not fundamentally change the core upload-transcode-CDN-stream architecture described here.

When Would You Design This Differently?#

Scenario	Simpler Approach	Upgrade Trigger
Startup: under 10K DAU	Skip transcoding — serve the original upload directly. Use a single CDN. No adaptive bitrate needed.	When users on slow connections report buffering, or when CDN egress costs exceed the cost of transcoding
Short-form video (TikTok-like)	Transcode to fewer resolutions (360p, 720p, 1080p only). Use 2-second segments for faster startup. Aggressive CDN pre-positioning since content is small.	When you expand to long-form content where 4K and segment-level quality switching matter
Internal corporate video	Single resolution, single codec. No CDN — serve from origin within the corporate network.	When employees access videos from remote offices or mobile devices with varying bandwidth
Educational platform (small catalog)	Pre-transcode everything at upload time. No need for a transcoding queue — synchronous processing is fine for minutes-long waits.	When upload volume exceeds what synchronous processing can handle, or when processing blocks the upload API

Summary#

Component	Design Decision	Key Reasoning
Upload pipeline	Resumable upload → S3 staging → Kafka → transcoding workers → object storage	Decouples upload acceptance (fast) from transcoding (slow, CPU-intensive); Kafka ensures no upload is lost if a worker crashes
Transcoding	DAG pipeline producing H.264 (all resolutions) + H.265/AV1 (high resolutions)	Parallel segment transcoding reduces processing time from hours to minutes; multi-codec supports all devices while minimizing bandwidth
Streaming protocol	HLS with 4-second segments and adaptive bitrate	Universal device support; ABR ensures smooth playback by switching quality based on real-time bandwidth measurement
CDN	Multi-tier CDN serving 95%+ of video bytes; pull-through for most content, pre-positioning for trending videos	15 Tbps peak bandwidth is physically unservable from origin; CDN is the foundational requirement, not an optimization
Metadata	PostgreSQL + Redis cache for video metadata; Elasticsearch for search; Redis INCR for view counts	ACID guarantees for creator content; sub-millisecond reads via cache; full-text search via purpose-built engine
Storage	S3/GCS object storage with tiered storage classes	Only economically viable option for exabyte-scale video storage; hot tier for recent/popular content, cold tier for archive
API design	Pre-signed upload URLs; separate endpoints for metadata vs. streaming	Large uploads bypass API servers; viewing path is CDN-first with API serving only manifests and metadata
Search	Elasticsearch synced via CDC from PostgreSQL	Avoids dual-write inconsistency; Elasticsearch is rebuildable from the PostgreSQL source of truth at any time

The video streaming system is, at its core, a content preparation and delivery pipeline. The upload path converts a single raw file into dozens of optimized variants. The viewing path serves those variants from locations as close to the viewer as possible. The CDN is the architectural keystone — everything else in the system exists to feed content into the CDN efficiently and serve metadata alongside it. Master the separation of upload processing from viewing delivery, and the rest of the design follows naturally.

Sources:

PreviousDistributed File Storage

Video Streaming

Video Transcoding: Codec and Resolution Strategy

HLS vs. DASH: Streaming Protocol Comparison

CDN Caching Strategy: Push vs. Pull

Video Metadata Schema and Search

Arch Advisor