Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
YouTube Architecture
§1Step 2 — High-Level Design
Handle 500 hours of video uploaded per minute. Transcoding farms, CDN strategy, and recommendation engine.
Add an edge API gateway to split and route upload, playback, search, and feed traffic to their respective backend services.
The API gateway handles: SSL termination, authentication (OAuth2), routing (upload → upload service, playback → playback service, search → search API, feed → recommendation API), rate limiting (100 uploads/day/channel), and bot detection for view counts.
Without a unified gateway, each backend service needs auth and rate limiting. The gateway also enables traffic management: during an upload spike, playback traffic can be prioritized. The gateway provides a stable surface for A/B testing routing logic.
The gateway is in the critical path for every request. YouTube uses Google's own load balancing infrastructure (Maglev + Envoy) which can handle 100M+ RPS globally. Custom gateway logic must be kept minimal to avoid becoming a bottleneck.
YouTube routes traffic through Google's Maglev load balancer and Edge Proxy infrastructure. This handles geo-routing (send users to the nearest YouTube data center), SSL termination at the edge, and anti-DDoS.
YouTube peak: 5M+ concurrent users. Gateway throughput: 1M+ RPS. SSL termination overhead: ~0.5ms. Rate limit: 100 video uploads/day per channel. Auth token validation: <1ms (cached).
Add object storage as the durable origin for raw uploaded video and all transcoded segments (HLS/DASH chunks in multiple resolutions).
Object storage holds: raw uploaded video (multi-GB files), transcoded HLS segments per resolution (360p/720p/1080p/4K), VP9/AV1 encoded variants, audio-only streams, and thumbnail images. Video is stored as immutable content once transcoded.
Video is write-once, read-many. Object storage is optimized for this pattern: high durability (11 nines), globally redundant, and supports byte-range requests (for video seeking). CDN caches segments from object storage origin.
Object storage has variable latency (50–500ms per request). CDNs cache the hot content, but cold content (older videos) requires origin fetches. YouTube uses regional object storage tiers: hot storage (recent uploads), warm storage (last year), cold storage (archives).
YouTube uses Google Cloud Storage (GCS) internally. At 500 hours of video uploaded per minute × avg 2 GB/hour × multiple resolutions: ~5 TB of new storage per minute. Total YouTube storage: estimated at 1+ exabyte.
Raw upload: 500 hours/min × 2 GB/hour = 1 TB/min raw. After transcoding (5 resolutions): ~5 TB/min total. GCS durability: 11 nines. Retrieval for CDN: byte-range requests at 1 Gbps per segment.
Add Kafka to carry upload-complete events from the upload API to transcoding workers, search indexers, and recommendation model updaters.
Kafka topics: upload-complete (triggers transcoding pipeline), transcode-complete (triggers CDN prewarming and search indexing), view-events (video plays, watch time, engagement for recommendation model), and ad-events (monetization tracking).
Video processing is a pipeline with multiple stages and different SLAs. Kafka decouples each stage: transcoding takes minutes; search indexing takes seconds; recommendation model updates are batched hourly. Without Kafka, a slow stage would block all subsequent processing.
Kafka makes the pipeline resilient to failures — if the search indexer is down, upload-complete events accumulate in Kafka and are processed when the indexer recovers. The tradeoff is eventual consistency: a newly uploaded video may not appear in search for 30–60 seconds.
YouTube's upload pipeline (internally called 'Viper') processes 500 hours/minute of video. Transcoding runs on Google's custom TPU/GPU infrastructure. The pipeline from upload to available-for-viewing takes ~5 minutes for standard videos.
Kafka ingest: 500 uploads/minute = ~8/second. Each upload triggers ~10 downstream events across transcoding pipeline stages. View events: 1B+ per day = ~12K/second. Kafka retention: 7 days.
§2Step 3 — Deep Dive
The API gateway handles: SSL termination, authentication (OAuth2), routing (upload → upload service, playback → playback service, search → search API, feed → recommendation API), rate limiting (100 uploads/day/channel), and bot detection for view counts.
| Protocol | Latency | Adaptive bitrate | CDN cacheable | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| HLS (HTTP Live Streaming) | 6-30s buffer | Yes (variant playlist) | Yes (HTTP segments) | VOD, live streaming, iOS ✓ | Medium | Low |
| DASH (MPEG-DASH) | 2-10s buffer | Yes (MPD manifest) | Yes (HTTP segments) | Cross-platform VOD, DRM ✓ | Medium | Low |
| WebRTC | <100ms | No (fixed codec) | No (peer-to-peer) | Live conferencing, sub-second latency | Medium | Medium |
| RTMP (legacy) | 1-3s | No | No (persistent conn) | Ingest from OBS -> transcoder (ingest only) | Low | Low |
| Progressive download (MP4) | Full buffering | No | Yes (byte range) | Short clips, no adaptive needed | Low | Low |
Video streaming protocols — HLS wins for adaptive bitrate + CDN compatibility.
#!/bin/bash
INPUT="$1"
OUTPUT_DIR="$2"
mkdir -p "$OUTPUT_DIR"
ffmpeg -i "$INPUT" \
-filter_complex "[0:v]split=4[v1][v2][v3][v4]" \
-map "[v1]" -vf scale=1920:1080 -c:v libx264 -b:v 5000k -preset fast \
-map "[v2]" -vf scale=1280:720 -c:v libx264 -b:v 2800k -preset fast \
-map "[v3]" -vf scale=854:480 -c:v libx264 -b:v 1400k -preset fast \
-map "[v4]" -vf scale=640:360 -c:v libx264 -b:v 800k -preset fast \
-map 0:a -c:a aac -b:a 128k \
-f hls -hls_time 6 -hls_list_size 0 \
-hls_segment_filename "$OUTPUT_DIR/%v/seg%03d.ts" \
-master_pl_name master.m3u8 \
-var_stream_map "v:0,a:0 v:1,a:0 v:2,a:0 v:3,a:0" \
"$OUTPUT_DIR/%v/index.m3u8"
aws s3 sync "$OUTPUT_DIR/" "s3://cdn-origin/videos/$(basename $INPUT .mp4)/" \
--cache-control "max-age=31536000"| Component | Why Add It | Tradeoff |
|---|---|---|
| API Gateway | Without a unified gateway, each backend service needs auth and rate limiting. | The gateway is in the critical path for every request. |
| Object Storage for Video | Video is write-once, read-many. | Object storage has variable latency (50–500ms per request). |
| Kafka for Upload Events | Video processing is a pipeline with multiple stages and different SLAs. | Kafka makes the pipeline resilient to failures — if the search indexer is down, upload-complete events accumulate in Kafka and are processed when the indexer recovers. |
Design decision tradeoffs
worker-1 crashes mid-transcode on a large video. The partially transcoded file is corrupt. How do you implement idempotent transcoding jobs with checkpointing so the job restarts from where it left off without re-processing completed segments?
A video goes viral and 50M users try to stream it simultaneously. cdn-1 cache misses cause a stampede on storage-1 and playback-api. How do you implement CDN origin shield, cache coalescing (only one origin request per cache miss), and adaptive bitrate selection to reduce origin load?
A major event causes 100K video uploads in 1 hour. The transcoding queue fills to 500K jobs. New uploads wait 48 hours for processing. How do you implement priority lanes (premium vs. free uploads), auto-scaling worker-1 instances, and progress notifications to users?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| API Gateway | The API gateway handles: SSL termination, authentication (OAuth2), routing (upload → upload service, playback → playback service, search → search API, feed → recommendation API), rate limiting (100 uploads/day/channel), and bot detection for view counts. | Without a unified gateway, each backend service needs auth and rate limiting. |
| Object Storage for Video | Object storage holds: raw uploaded video (multi-GB files), transcoded HLS segments per resolution (360p/720p/1080p/4K), VP9/AV1 encoded variants, audio-only streams, and thumbnail images. | Video is write-once, read-many. |
| Kafka for Upload Events | Kafka topics: upload-complete (triggers transcoding pipeline), transcode-complete (triggers CDN prewarming and search indexing), view-events (video plays, watch time, engagement for recommendation model), and ad-events (monetization tracking). | Video processing is a pipeline with multiple stages and different SLAs. |
Key design decisions