Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

TikTok For You Page

advancedMLRecommendations

Large-Scale·85 min read

TikTok For You Page

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

MLRecommendations

§1Step 2 — High-Level Design

2High-Level Design

Serve personalized video feeds with sub-200ms cold start. Multi-modal embeddings and online learning.

System architecture overview

Interactive diagram locked

Upgrade to Pro to build and run this system.

Stage 1 of 4Starting state — the problem to solve

Progressive build — add each component step by step

Interactive diagram locked

Upgrade to Pro to build and run this system.

Add Kafka for Engagement Events

Add Kafka to carry every user interaction (watch time, like, share, skip, comment) from the For You Page to ML feature pipelines.

What it does

Kafka topics: engagement-events (videoID, userID, watchTime, completionRate, liked, shared, commented), upload-events (new video available for indexing), and notification-events (viral video triggers for creator). Stream processors consume engagement events to update real-time ML features.

Why it matters

TikTok's FYP algorithm is trained on billions of micro-signals: did the user watch >25% of the video? Did they rewatch? Did they share? These signals are captured client-side and streamed to Kafka, enabling near-real-time model updates.

Trade-off

High-frequency engagement events create enormous Kafka ingest: 1B users × 30 min/day average × 1 event/second = 1.8 trillion events/day. TikTok uses aggressive client-side batching (send every 5–10 events) to reduce Kafka write rate.

Real world

TikTok's recommendation system is widely regarded as the most effective content recommendation engine ever deployed. Their engineering blog describes a two-stage ranking: candidate retrieval (collaborative filtering) followed by ranking (deep neural network scoring).

Capacity math

Estimated Kafka ingest: 10M+ events/second at peak. Event size: ~200 bytes. Kafka retention: 24 hours for real-time consumers. Daily volume: ~1 TB compressed engagement data.

In the real world: TikTok's recommendation system is widely regarded as the most effective content recommendation engine ever deployed. Their engineering blog describes a two-stage ranking: candidate retrieval (collaborative filtering) followed by ranking (deep neural network scoring).

Add Stream Processor for Real-Time Features

Add a stream processor (Flink) to consume Kafka engagement events and compute real-time ML features: per-video popularity decay, per-user interest updates, and trending signals.

What it does

Flink jobs compute: (1) video engagement rate (likes+shares+comments / views, windowed over 1h/6h/24h), (2) user interest vector updates (which categories/creators the user engaged with in last 30 min), (3) trending videos per geographic region, and (4) creator notification triggers (video crossed 10K/100K views).

Why it matters

Real-time features differentiate TikTok's FYP from batch-only recommendation systems. A video going viral in the last hour should appear in FYP immediately — not wait for a daily batch job. Flink enables sub-minute feature freshness.

Trade-off

Stateful stream processing (windowed aggregations) requires distributed state management. Flink uses RocksDB for operator state, but large state can cause slow checkpoint intervals. TikTok limits state size by aggressively expiring old windows.

Real world

ByteDance (TikTok's parent) is a major contributor to Apache Flink. They run Flink at petabyte-scale for multiple products. TikTok's real-time ML feature pipeline processes 100K+ events/second per Flink job.

Capacity math

Flink window size: 1h/6h/24h sliding windows. Feature update latency: <30 seconds. Flink parallelism: 100s of task managers per job. State size: ~1 TB per major windowed job.

In the real world: ByteDance (TikTok's parent) is a major contributor to Apache Flink. They run Flink at petabyte-scale for multiple products. TikTok's real-time ML feature pipeline processes 100K+ events/second per Flink job.

Add Redis for FYP Candidate Cache

Add Redis to cache pre-computed FYP candidate lists per user — the API reads precomputed candidates from Redis rather than running full ML inference per request.

What it does

A background ranking pipeline periodically generates personalized candidate lists per user (50–200 video candidates ranked by predicted engagement probability). These are stored in Redis keyed by userID. The FYP API reads from Redis and returns the next N videos, maintaining a cursor per user.

Why it matters

Two-stage ranking (candidate retrieval + scoring) takes 100–500ms. At 1B users, running this synchronously per request is infeasible. Pre-generating candidates in the background allows the API to serve FYP from Redis in <20ms.

Trade-off

Pre-generated candidates become stale between refreshes. TikTok refreshes FYP candidates every 5–15 minutes. A user who watches the same app session for 30 minutes may see the same video twice if the candidate pool is exhausted — solved by tracking shown videos per session.

Real world

TikTok's FYP uses a two-phase approach: offline candidate generation (collaborative filtering on the full item corpus) + online ranking (neural network scoring of ~500 candidates in real-time). The pre-computed candidates balance personalization with latency.

Capacity math

Redis entry per user: ~2 KB (200 video IDs). At 1B users: 2 TB. Candidate refresh rate: every 10 minutes. FYP API p99: <50ms for Redis reads. Ranking model inference: <100ms for 500 candidates.

In the real world: TikTok's FYP uses a two-phase approach: offline candidate generation (collaborative filtering on the full item corpus) + online ranking (neural network scoring of ~500 candidates in real-time). The pre-computed candidates balance personalization with latency.

Recommendation Cache Eviction Storm: cache-1 runs out of memory and mass-evicts pre-computed FYP candidate lists. Every subsequent API request misses the cache and triggers real-time inference. The ranking model fleet is overwhelmed. How do you implement staggered TTLs, cache warming, and fallback to trending videos?

§2Step 3 — Deep Dive

3Deep Dive

Algorithm	Scalability	Personalization	Cold start	Best for	Cost	Ops burden
Two-stage (candidate gen + ranking)	Billions of items	Deep (transformer)	Moderate	TikTok FYP, YouTube, Spotify ✓	High	High
Collaborative filtering (matrix factorization)	Millions of items	High	Poor	Netflix movies, Spotify playlists	High	High
Content-based filtering	Any scale	Medium	Good	New item cold start, news feeds	Medium	Medium
Two-tower model (DSSM)	Billions (ANN lookup)	High	Moderate	Candidate retrieval stage ✓	High	High
Sequential model (SASRec)	Moderate (re-rank)	Very high	Poor	Re-ranking stage, session signals	High	High

Recommendation algorithms — two-stage ranking wins for billion-item catalogs.

pythonTwo-stage FYP pipeline: ANN retrieval + pointwise re-ranking

import numpy as np
import faiss

# Stage 1: Candidate retrieval -- ANN search in embedding space
class CandidateRetriever:
    def __init__(self, item_embeddings: np.ndarray, item_ids: list):
        dim = item_embeddings.shape[1]
        # IVF flat index: 1000 clusters, search top-500 candidates in <10ms
        self.index = faiss.IndexIVFFlat(faiss.IndexFlatL2(dim), dim, 1000)
        self.index.train(item_embeddings)
        self.index.add(item_embeddings)
        self.index.nprobe = 64   # probe 64 clusters -- recall/latency tradeoff
        self.ids = item_ids

    def retrieve(self, user_embedding: np.ndarray, k: int = 500) -> list:
        _, indices = self.index.search(user_embedding.reshape(1, -1), k)
        return [self.ids[i] for i in indices[0] if i >= 0]

# Stage 2: Re-ranking -- pointwise deep model on 500 candidates
def rerank(user_id: str, candidates: list, ranking_model) -> list:
    features = [build_features(user_id, cid) for cid in candidates]
    scores   = ranking_model.predict(features)   # pCTR x pCompletion x pShare
    ranked   = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)
    return [cid for cid, _ in ranked[:20]]

# Final FYP feed: top-20 from 500 candidates, from 1M+ item pool
# Diversity injection: at most 2 consecutive videos from same creator

Component	Why Add It	Tradeoff
Kafka for Engagement Events	TikTok's FYP algorithm is trained on billions of micro-signals: did the user watch >25% of the video?	High-frequency engagement events create enormous Kafka ingest: 1B users × 30 min/day average × 1 event/second = 1.
Stream Processor for Real-Time Features	Real-time features differentiate TikTok's FYP from batch-only recommendation systems.	Stateful stream processing (windowed aggregations) requires distributed state management.
Redis for FYP Candidate Cache	Two-stage ranking (candidate retrieval + scoring) takes 100–500ms.	Pre-generated candidates become stale between refreshes.

Design decision tradeoffs

Recommendation Cache Eviction Storm

cache-1 runs out of memory and mass-evicts pre-computed FYP candidate lists. Every subsequent API request misses the cache and triggers real-time inference. The ranking model fleet is overwhelmed. How do you implement staggered TTLs, cache warming, and fallback to trending videos?

Viral Content Inference Spike

A viral challenge causes 10x normal FYP request traffic. The ML ranking model replicas are saturated; inference queues grow to 50K pending requests. Latency spikes from 80ms to 5 seconds. How do you implement request batching, model autoscaling, and simplified ranking as a fallback?

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Kafka for Engagement Events	Kafka topics: engagement-events (videoID, userID, watchTime, completionRate, liked, shared, commented), upload-events (new video available for indexing), and notification-events (viral video triggers for creator).	TikTok's FYP algorithm is trained on billions of micro-signals: did the user watch >25% of the video?
Stream Processor for Real-Time Features	Flink jobs compute: (1) video engagement rate (likes+shares+comments / views, windowed over 1h/6h/24h), (2) user interest vector updates (which categories/creators the user engaged with in last 30 min), (3) trending videos per geographic region, and (4) creator notification triggers (video crossed 10K/100K views).	Real-time features differentiate TikTok's FYP from batch-only recommendation systems.
Redis for FYP Candidate Cache	A background ranking pipeline periodically generates personalized candidate lists per user (50–200 video candidates ranked by predicted engagement probability).	Two-stage ranking (candidate retrieval + scoring) takes 100–500ms.

Key design decisions

If the interviewer asks to scale 10×: 10x the load — architectural moves that work. Identify the single bottleneck (usually the database write path) and address it first before horizontal scaling.

10× Target30.0M RPSwhere your architecture must hold

What's next

Large-Scale

Google Maps Architecture