Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
TikTok For You Page
§1Step 2 — High-Level Design
Serve personalized video feeds with sub-200ms cold start. Multi-modal embeddings and online learning.
Interactive diagram locked
Upgrade to Pro to build and run this system.
Interactive diagram locked
Upgrade to Pro to build and run this system.
Add Kafka to carry every user interaction (watch time, like, share, skip, comment) from the For You Page to ML feature pipelines.
Kafka topics: engagement-events (videoID, userID, watchTime, completionRate, liked, shared, commented), upload-events (new video available for indexing), and notification-events (viral video triggers for creator). Stream processors consume engagement events to update real-time ML features.
TikTok's FYP algorithm is trained on billions of micro-signals: did the user watch >25% of the video? Did they rewatch? Did they share? These signals are captured client-side and streamed to Kafka, enabling near-real-time model updates.
High-frequency engagement events create enormous Kafka ingest: 1B users × 30 min/day average × 1 event/second = 1.8 trillion events/day. TikTok uses aggressive client-side batching (send every 5–10 events) to reduce Kafka write rate.
TikTok's recommendation system is widely regarded as the most effective content recommendation engine ever deployed. Their engineering blog describes a two-stage ranking: candidate retrieval (collaborative filtering) followed by ranking (deep neural network scoring).
Estimated Kafka ingest: 10M+ events/second at peak. Event size: ~200 bytes. Kafka retention: 24 hours for real-time consumers. Daily volume: ~1 TB compressed engagement data.
Add a stream processor (Flink) to consume Kafka engagement events and compute real-time ML features: per-video popularity decay, per-user interest updates, and trending signals.
Flink jobs compute: (1) video engagement rate (likes+shares+comments / views, windowed over 1h/6h/24h), (2) user interest vector updates (which categories/creators the user engaged with in last 30 min), (3) trending videos per geographic region, and (4) creator notification triggers (video crossed 10K/100K views).
Real-time features differentiate TikTok's FYP from batch-only recommendation systems. A video going viral in the last hour should appear in FYP immediately — not wait for a daily batch job. Flink enables sub-minute feature freshness.
Stateful stream processing (windowed aggregations) requires distributed state management. Flink uses RocksDB for operator state, but large state can cause slow checkpoint intervals. TikTok limits state size by aggressively expiring old windows.
ByteDance (TikTok's parent) is a major contributor to Apache Flink. They run Flink at petabyte-scale for multiple products. TikTok's real-time ML feature pipeline processes 100K+ events/second per Flink job.
Flink window size: 1h/6h/24h sliding windows. Feature update latency: <30 seconds. Flink parallelism: 100s of task managers per job. State size: ~1 TB per major windowed job.
Add Redis to cache pre-computed FYP candidate lists per user — the API reads precomputed candidates from Redis rather than running full ML inference per request.
A background ranking pipeline periodically generates personalized candidate lists per user (50–200 video candidates ranked by predicted engagement probability). These are stored in Redis keyed by userID. The FYP API reads from Redis and returns the next N videos, maintaining a cursor per user.
Two-stage ranking (candidate retrieval + scoring) takes 100–500ms. At 1B users, running this synchronously per request is infeasible. Pre-generating candidates in the background allows the API to serve FYP from Redis in <20ms.
Pre-generated candidates become stale between refreshes. TikTok refreshes FYP candidates every 5–15 minutes. A user who watches the same app session for 30 minutes may see the same video twice if the candidate pool is exhausted — solved by tracking shown videos per session.
TikTok's FYP uses a two-phase approach: offline candidate generation (collaborative filtering on the full item corpus) + online ranking (neural network scoring of ~500 candidates in real-time). The pre-computed candidates balance personalization with latency.
Redis entry per user: ~2 KB (200 video IDs). At 1B users: 2 TB. Candidate refresh rate: every 10 minutes. FYP API p99: <50ms for Redis reads. Ranking model inference: <100ms for 500 candidates.
§2Step 3 — Deep Dive
Kafka topics: engagement-events (videoID, userID, watchTime, completionRate, liked, shared, commented), upload-events (new video available for indexing), and notification-events (viral video triggers for creator). Stream processors consume engagement events to update real-time ML features.
| Algorithm | Scalability | Personalization | Cold start | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| Two-stage (candidate gen + ranking) | Billions of items | Deep (transformer) | Moderate | TikTok FYP, YouTube, Spotify ✓ | High | High |
| Collaborative filtering (matrix factorization) | Millions of items | High | Poor | Netflix movies, Spotify playlists | High | High |
| Content-based filtering | Any scale | Medium | Good | New item cold start, news feeds | Medium | Medium |
| Two-tower model (DSSM) | Billions (ANN lookup) | High | Moderate | Candidate retrieval stage ✓ | High | High |
| Sequential model (SASRec) | Moderate (re-rank) | Very high | Poor | Re-ranking stage, session signals | High | High |
Recommendation algorithms — two-stage ranking wins for billion-item catalogs.
import numpy as np
import faiss
# Stage 1: Candidate retrieval -- ANN search in embedding space
class CandidateRetriever:
def __init__(self, item_embeddings: np.ndarray, item_ids: list):
dim = item_embeddings.shape[1]
# IVF flat index: 1000 clusters, search top-500 candidates in <10ms
self.index = faiss.IndexIVFFlat(faiss.IndexFlatL2(dim), dim, 1000)
self.index.train(item_embeddings)
self.index.add(item_embeddings)
self.index.nprobe = 64 # probe 64 clusters -- recall/latency tradeoff
self.ids = item_ids
def retrieve(self, user_embedding: np.ndarray, k: int = 500) -> list:
_, indices = self.index.search(user_embedding.reshape(1, -1), k)
return [self.ids[i] for i in indices[0] if i >= 0]
# Stage 2: Re-ranking -- pointwise deep model on 500 candidates
def rerank(user_id: str, candidates: list, ranking_model) -> list:
features = [build_features(user_id, cid) for cid in candidates]
scores = ranking_model.predict(features) # pCTR x pCompletion x pShare
ranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)
return [cid for cid, _ in ranked[:20]]
# Final FYP feed: top-20 from 500 candidates, from 1M+ item pool
# Diversity injection: at most 2 consecutive videos from same creator| Component | Why Add It | Tradeoff |
|---|---|---|
| Kafka for Engagement Events | TikTok's FYP algorithm is trained on billions of micro-signals: did the user watch >25% of the video? | High-frequency engagement events create enormous Kafka ingest: 1B users × 30 min/day average × 1 event/second = 1. |
| Stream Processor for Real-Time Features | Real-time features differentiate TikTok's FYP from batch-only recommendation systems. | Stateful stream processing (windowed aggregations) requires distributed state management. |
| Redis for FYP Candidate Cache | Two-stage ranking (candidate retrieval + scoring) takes 100–500ms. | Pre-generated candidates become stale between refreshes. |
Design decision tradeoffs
cache-1 runs out of memory and mass-evicts pre-computed FYP candidate lists. Every subsequent API request misses the cache and triggers real-time inference. The ranking model fleet is overwhelmed. How do you implement staggered TTLs, cache warming, and fallback to trending videos?
A viral challenge causes 10x normal FYP request traffic. The ML ranking model replicas are saturated; inference queues grow to 50K pending requests. Latency spikes from 80ms to 5 seconds. How do you implement request batching, model autoscaling, and simplified ranking as a fallback?
A viral video is requested by 500M users in 1 hour. All requests hit the same storage shard for that video's data. Throughput is saturated. How do you implement CDN offloading, read replicas, and consistent hashing to distribute hot content across shards?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Kafka for Engagement Events | Kafka topics: engagement-events (videoID, userID, watchTime, completionRate, liked, shared, commented), upload-events (new video available for indexing), and notification-events (viral video triggers for creator). | TikTok's FYP algorithm is trained on billions of micro-signals: did the user watch >25% of the video? |
| Stream Processor for Real-Time Features | Flink jobs compute: (1) video engagement rate (likes+shares+comments / views, windowed over 1h/6h/24h), (2) user interest vector updates (which categories/creators the user engaged with in last 30 min), (3) trending videos per geographic region, and (4) creator notification triggers (video crossed 10K/100K views). | Real-time features differentiate TikTok's FYP from batch-only recommendation systems. |
| Redis for FYP Candidate Cache | A background ranking pipeline periodically generates personalized candidate lists per user (50–200 video candidates ranked by predicted engagement probability). | Two-stage ranking (candidate retrieval + scoring) takes 100–500ms. |
Key design decisions