Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

YouTube Architecture

advancedMediaCDNML

Large-Scale·90 min read

YouTube Architecture

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

MediaCDNML

§1Step 2 — High-Level Design

2High-Level Design

Handle 500 hours of video uploaded per minute. Transcoding farms, CDN strategy, and recommendation engine.

System architecture overview

Stage 1 of 4Starting state — the problem to solve

Progressive build — add each component step by step

Add API Gateway

Add an edge API gateway to split and route upload, playback, search, and feed traffic to their respective backend services.

What it does

The API gateway handles: SSL termination, authentication (OAuth2), routing (upload → upload service, playback → playback service, search → search API, feed → recommendation API), rate limiting (100 uploads/day/channel), and bot detection for view counts.

Why it matters

Without a unified gateway, each backend service needs auth and rate limiting. The gateway also enables traffic management: during an upload spike, playback traffic can be prioritized. The gateway provides a stable surface for A/B testing routing logic.

Trade-off

The gateway is in the critical path for every request. YouTube uses Google's own load balancing infrastructure (Maglev + Envoy) which can handle 100M+ RPS globally. Custom gateway logic must be kept minimal to avoid becoming a bottleneck.

Real world

YouTube routes traffic through Google's Maglev load balancer and Edge Proxy infrastructure. This handles geo-routing (send users to the nearest YouTube data center), SSL termination at the edge, and anti-DDoS.

Capacity math

YouTube peak: 5M+ concurrent users. Gateway throughput: 1M+ RPS. SSL termination overhead: ~0.5ms. Rate limit: 100 video uploads/day per channel. Auth token validation: <1ms (cached).

In the real world: YouTube routes traffic through Google's Maglev load balancer and Edge Proxy infrastructure. This handles geo-routing (send users to the nearest YouTube data center), SSL termination at the edge, and anti-DDoS.

Add Object Storage for Video

Add object storage as the durable origin for raw uploaded video and all transcoded segments (HLS/DASH chunks in multiple resolutions).

What it does

Object storage holds: raw uploaded video (multi-GB files), transcoded HLS segments per resolution (360p/720p/1080p/4K), VP9/AV1 encoded variants, audio-only streams, and thumbnail images. Video is stored as immutable content once transcoded.

Why it matters

Video is write-once, read-many. Object storage is optimized for this pattern: high durability (11 nines), globally redundant, and supports byte-range requests (for video seeking). CDN caches segments from object storage origin.

Trade-off

Object storage has variable latency (50–500ms per request). CDNs cache the hot content, but cold content (older videos) requires origin fetches. YouTube uses regional object storage tiers: hot storage (recent uploads), warm storage (last year), cold storage (archives).

Real world

YouTube uses Google Cloud Storage (GCS) internally. At 500 hours of video uploaded per minute × avg 2 GB/hour × multiple resolutions: ~5 TB of new storage per minute. Total YouTube storage: estimated at 1+ exabyte.

Capacity math

Raw upload: 500 hours/min × 2 GB/hour = 1 TB/min raw. After transcoding (5 resolutions): ~5 TB/min total. GCS durability: 11 nines. Retrieval for CDN: byte-range requests at 1 Gbps per segment.

In the real world: YouTube uses Google Cloud Storage (GCS) internally. At 500 hours of video uploaded per minute × avg 2 GB/hour × multiple resolutions: ~5 TB of new storage per minute. Total YouTube storage: estimated at 1+ exabyte.

Add Kafka for Upload Events

Add Kafka to carry upload-complete events from the upload API to transcoding workers, search indexers, and recommendation model updaters.

What it does

Kafka topics: upload-complete (triggers transcoding pipeline), transcode-complete (triggers CDN prewarming and search indexing), view-events (video plays, watch time, engagement for recommendation model), and ad-events (monetization tracking).

Why it matters

Video processing is a pipeline with multiple stages and different SLAs. Kafka decouples each stage: transcoding takes minutes; search indexing takes seconds; recommendation model updates are batched hourly. Without Kafka, a slow stage would block all subsequent processing.

Trade-off

Kafka makes the pipeline resilient to failures — if the search indexer is down, upload-complete events accumulate in Kafka and are processed when the indexer recovers. The tradeoff is eventual consistency: a newly uploaded video may not appear in search for 30–60 seconds.

Real world

YouTube's upload pipeline (internally called 'Viper') processes 500 hours/minute of video. Transcoding runs on Google's custom TPU/GPU infrastructure. The pipeline from upload to available-for-viewing takes ~5 minutes for standard videos.

Capacity math

Kafka ingest: 500 uploads/minute = ~8/second. Each upload triggers ~10 downstream events across transcoding pipeline stages. View events: 1B+ per day = ~12K/second. Kafka retention: 7 days.

In the real world: YouTube's upload pipeline (internally called 'Viper') processes 500 hours/minute of video. Transcoding runs on Google's custom TPU/GPU infrastructure. The pipeline from upload to available-for-viewing takes ~5 minutes for standard videos.

Transcoding Worker Crash: worker-1 crashes mid-transcode on a large video. The partially transcoded file is corrupt. How do you implement idempotent transcoding jobs with checkpointing so the job restarts from where it left off without re-processing completed segments?

§2Step 3 — Deep Dive

3Deep Dive

Protocol	Latency	Adaptive bitrate	CDN cacheable	Best for	Cost	Ops burden
HLS (HTTP Live Streaming)	6-30s buffer	Yes (variant playlist)	Yes (HTTP segments)	VOD, live streaming, iOS ✓	Medium	Low
DASH (MPEG-DASH)	2-10s buffer	Yes (MPD manifest)	Yes (HTTP segments)	Cross-platform VOD, DRM ✓	Medium	Low
WebRTC	<100ms	No (fixed codec)	No (peer-to-peer)	Live conferencing, sub-second latency	Medium	Medium
RTMP (legacy)	1-3s	No	No (persistent conn)	Ingest from OBS -> transcoder (ingest only)	Low	Low
Progressive download (MP4)	Full buffering	No	Yes (byte range)	Short clips, no adaptive needed	Low	Low

Video streaming protocols — HLS wins for adaptive bitrate + CDN compatibility.

bashFFmpeg adaptive bitrate transcoding + S3 HLS upload

#!/bin/bash
INPUT="$1"
OUTPUT_DIR="$2"

mkdir -p "$OUTPUT_DIR"

ffmpeg -i "$INPUT" \
  -filter_complex "[0:v]split=4[v1][v2][v3][v4]" \
  -map "[v1]" -vf scale=1920:1080 -c:v libx264 -b:v 5000k -preset fast \
  -map "[v2]" -vf scale=1280:720  -c:v libx264 -b:v 2800k -preset fast \
  -map "[v3]" -vf scale=854:480   -c:v libx264 -b:v 1400k -preset fast \
  -map "[v4]" -vf scale=640:360   -c:v libx264 -b:v 800k  -preset fast \
  -map 0:a -c:a aac -b:a 128k \
  -f hls -hls_time 6 -hls_list_size 0 \
  -hls_segment_filename "$OUTPUT_DIR/%v/seg%03d.ts" \
  -master_pl_name master.m3u8 \
  -var_stream_map "v:0,a:0 v:1,a:0 v:2,a:0 v:3,a:0" \
  "$OUTPUT_DIR/%v/index.m3u8"

aws s3 sync "$OUTPUT_DIR/" "s3://cdn-origin/videos/$(basename $INPUT .mp4)/" \
  --cache-control "max-age=31536000"

Component	Why Add It	Tradeoff
API Gateway	Without a unified gateway, each backend service needs auth and rate limiting.	The gateway is in the critical path for every request.
Object Storage for Video	Video is write-once, read-many.	Object storage has variable latency (50–500ms per request).
Kafka for Upload Events	Video processing is a pipeline with multiple stages and different SLAs.	Kafka makes the pipeline resilient to failures — if the search indexer is down, upload-complete events accumulate in Kafka and are processed when the indexer recovers.

Design decision tradeoffs

Transcoding Worker Crash

worker-1 crashes mid-transcode on a large video. The partially transcoded file is corrupt. How do you implement idempotent transcoding jobs with checkpointing so the job restarts from where it left off without re-processing completed segments?

Viral Video CDN Origin Overload

A video goes viral and 50M users try to stream it simultaneously. cdn-1 cache misses cause a stampede on storage-1 and playback-api. How do you implement CDN origin shield, cache coalescing (only one origin request per cache miss), and adaptive bitrate selection to reduce origin load?

Upload Processing Queue Backlog

A major event causes 100K video uploads in 1 hour. The transcoding queue fills to 500K jobs. New uploads wait 48 hours for processing. How do you implement priority lanes (premium vs. free uploads), auto-scaling worker-1 instances, and progress notifications to users?

Keep upload, playback, and feed/search as separate paths. Upload API writes metadata, stores the raw object, and emits an event. Playback API should only resolve manifests and metadata, while the client pulls actual video bytes from the CDN.

Use Kafka as the backbone for all asynchronous work: upload-complete events trigger transcoding, indexing workers push searchable metadata into the search tier, and recommendation workers refresh hot user/video data into Redis so the feed API serves precomputed candidates quickly.

The dominant scale problem is delivery. The object store is the origin of truth, but the CDN must carry the heavy playback traffic. Metadata and recommendations can sit in Postgres + Redis, while immutable video segments stay content-addressed and cacheable at the edge for long periods.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
API Gateway	The API gateway handles: SSL termination, authentication (OAuth2), routing (upload → upload service, playback → playback service, search → search API, feed → recommendation API), rate limiting (100 uploads/day/channel), and bot detection for view counts.	Without a unified gateway, each backend service needs auth and rate limiting.
Object Storage for Video	Object storage holds: raw uploaded video (multi-GB files), transcoded HLS segments per resolution (360p/720p/1080p/4K), VP9/AV1 encoded variants, audio-only streams, and thumbnail images.	Video is write-once, read-many.
Kafka for Upload Events	Kafka topics: upload-complete (triggers transcoding pipeline), transcode-complete (triggers CDN prewarming and search indexing), view-events (video plays, watch time, engagement for recommendation model), and ad-events (monetization tracking).	Video processing is a pipeline with multiple stages and different SLAs.

Key design decisions

If the interviewer asks to scale 10×: 10x the load — architectural moves that work. Identify the single bottleneck (usually the database write path) and address it first before horizontal scaling.

10× Target50.0M RPSwhere your architecture must hold

What's next

Networking

Content Delivery Network