Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Twitter/X Architecture

advancedSocialFan-out

Large-Scale·85 min read

Twitter/X Architecture

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

SocialFan-out

§1Step 2 — High-Level Design

2High-Level Design

Deliver tweets to 200M daily users in under 5 seconds. Timeline fanout, moments, and trending topics.

System architecture overview

Stage 1 of 4Starting state — the problem to solve

Progressive build — add each component step by step

Add Kafka for Tweet Events

Add Kafka to carry tweet creation events, engagement signals, and timeline refresh triggers from the tweet API to fan-out and search indexing workers.

What it does

Kafka topics: tweet-events (new tweets, retweets, deletes), engagement-events (likes, replies, quote tweets), and timeline-refresh-events (signals to re-rank a user's home timeline). Fan-out service and search indexers both consume from tweet-events.

Why it matters

Celebrity tweets (50M followers) cannot be fanned out synchronously in the write path — it would take too long. Kafka allows the tweet API to return to the user immediately. Fan-out happens asynchronously at background priority.

Trade-off

Async fan-out means followers see celebrity tweets with a delay (seconds to minutes for very large accounts). Twitter (X) uses a hybrid model: fans of celebrities with <1M followers get pre-computed timelines; fans of mega-celebrities pull on-demand.

Real world

Twitter's notorious 'Celebrity Problem': when Obama or Lady Gaga tweeted, the fan-out service would trigger millions of Redis write operations. Twitter solved this with the hybrid push/pull model, not pure fan-out.

Capacity math

Tweet creation rate: ~6K tweets/second. For a 50M-follower account: 50M fan-out writes in Redis. At 1ms per write: 50,000 seconds of single-threaded work — requires massive parallelism in the fan-out service.

In the real world: Twitter's notorious 'Celebrity Problem': when Obama or Lady Gaga tweeted, the fan-out service would trigger millions of Redis write operations. Twitter solved this with the hybrid push/pull model, not pure fan-out.

Add Redis Timeline Cache

Add Redis clusters for home timeline caching (pre-computed tweet ID lists per user) and tweet metadata cache (tweet content for timeline hydration).

What it does

Timeline cache: per-user Redis sorted set of tweet IDs (scored by timestamp), capped at 800 entries. When a new tweet is fanned out, it's pushed to every follower's timeline sorted set. Tweet cache: tweet content (text, media URLs, author ID) keyed by tweet ID for fast hydration.

Why it matters

Computing a home timeline from scratch (joining all followed accounts' tweets, sorting by time) would require a full database scan. Pre-computing and maintaining a timeline cache in Redis reduces timeline load to a O(1) Redis read.

Trade-off

Twitter caps timeline cache at 800 tweets per user. Users who haven't opened Twitter in >800 tweets worth of time have their timeline rebuilt from the database on next visit. This cold-start rebuild is the expensive path.

Real world

Twitter's timeline service (originally called Timelines API, then Manhattan) uses Redis clusters with ~100TB of total RAM for home timelines. Each user's timeline sorted set: ~6 KB (800 tweet IDs × 8 bytes).

Capacity math

Timeline cache: 330M active users × 6 KB = ~2 TB. Tweet cache: 100M active tweets × 1 KB = 100 GB. Timeline read latency: <5ms (Redis GET). Fan-out write rate: 6K tweets/sec × avg 200 followers = 1.2M Redis writes/sec.

In the real world: Twitter's timeline service (originally called Timelines API, then Manhattan) uses Redis clusters with ~100TB of total RAM for home timelines. Each user's timeline sorted set: ~6 KB (800 tweet IDs × 8 bytes).

Add Fan-Out Service

Add a dedicated fan-out service that expands a tweet event into per-follower Redis timeline writes, with special handling for high-follower accounts.

What it does

The fan-out service consumes tweet events from Kafka, looks up the author's follower list (sharded in a social graph store), and writes the tweet ID into each follower's Redis timeline sorted set. For accounts with >1M followers, fan-out is skipped — their tweets are injected at timeline read time (pull-based).

Why it matters

Pure push fan-out doesn't scale for celebrities: 50M writes per tweet would take hours and spike Redis write throughput. The hybrid model (push for normal accounts, pull for celebrities) bounds the fan-out work to manageable levels.

Trade-off

Pull-based delivery for celebrities adds latency to timeline reads: the API must fetch the celebrity's recent tweets at read time and merge them with the pre-computed timeline. This increases timeline load from <5ms (pure Redis) to ~20ms (Redis + celebrity tweet merge).

Real world

Twitter open-sourced their GraphJet real-time graph system (used for recommendations) and documented the hybrid fan-out approach in multiple engineering blog posts. The threshold for 'celebrity mode' is ~1M followers.

Capacity math

Normal fan-out: ~200 Redis writes per tweet (avg follower count). Celebrity threshold: 1M followers. Fan-out service workers: 100s of instances for parallelism. Target fan-out completion: <5 seconds for non-celebrity tweets.

In the real world: Twitter open-sourced their GraphJet real-time graph system (used for recommendations) and documented the hybrid fan-out approach in multiple engineering blog posts. The threshold for 'celebrity mode' is ~1M followers.

Celebrity Tweet Fan-Out: A user with 100M followers posts a tweet. The fan-out service must write to 100M timeline caches in real-time. This creates massive write amplification. How do you implement hybrid fan-out: push to active users' caches, pull for inactive followers, and prioritize high-engagement accounts?

§2Step 3 — Deep Dive

3Deep Dive

Strategy	Write amplification	Read latency	Celebrity handling	Best for	Cost	Ops burden
Fan-out on write (push all)	O(followers) per tweet	O(1)	Thundering herd at write	Normal users (<10K followers)	High	High
Fan-out on read (pull all)	O(1) write	O(following)	Fine at write	Celebrity accounts (>1M followers)	Low	Low
Hybrid push + pull (Twitter)	O(normal followers)	O(1) + O(celebs)	Pull at read for celebs ✓	Real mixed workloads ✓	High	High
Pre-ranked timeline cache	Async ranking job	O(1)	Included in ranking	Algorithmic timelines	High	High
Social graph DB traversal	O(1)	O(following x tweets)	N/A	Prototype only -- too slow	Medium	High

Tweet fan-out strategies — hybrid write/read fan-out handles celebrity accounts.

pythonHybrid fan-out: Redis timeline cache + celebrity pull merge

import redis
import time

r = redis.Redis()
CELEB_THRESHOLD = 1_000_000
MAX_TIMELINE_SIZE = 800

def on_new_tweet(author_id: str, tweet_id: int, follower_ids: list):
    """Fan out tweet to followers' timeline caches."""
    score = time.time()

    if len(follower_ids) >= CELEB_THRESHOLD:
        # Celebrity: store tweet in author's own sorted set; pull at read time
        r.zadd(f"tweets:{author_id}", {tweet_id: score})
        return

    # Normal user: push tweet_id into each follower's timeline
    pipe = r.pipeline(transaction=False)
    for fid in follower_ids:
        pipe.zadd(f"timeline:{fid}", {tweet_id: score})
        pipe.zremrangebyrank(f"timeline:{fid}", 0, -(MAX_TIMELINE_SIZE + 1))
    pipe.execute()

def get_home_timeline(user_id: str, following: list, limit: int = 20) -> list:
    """Merge cached timeline with real-time celebrity tweets."""
    cached_ids = [int(x) for x in r.zrevrange(f"timeline:{user_id}", 0, limit * 2)]

    celeb_tweet_ids = []
    for fid in following:
        if get_follower_count(fid) >= CELEB_THRESHOLD:
            ids = r.zrevrange(f"tweets:{fid}", 0, 5)
            celeb_tweet_ids.extend(int(x) for x in ids)

    all_ids = sorted(set(cached_ids + celeb_tweet_ids), reverse=True)
    return all_ids[:limit]

Component	Why Add It	Tradeoff
Kafka for Tweet Events	Celebrity tweets (50M followers) cannot be fanned out synchronously in the write path — it would take too long.	Async fan-out means followers see celebrity tweets with a delay (seconds to minutes for very large accounts).
Redis Timeline Cache	Computing a home timeline from scratch (joining all followed accounts' tweets, sorting by time) would require a full database scan.	Twitter caps timeline cache at 800 tweets per user.
Fan-Out Service	Pure push fan-out doesn't scale for celebrities: 50M writes per tweet would take hours and spike Redis write throughput.	Pull-based delivery for celebrities adds latency to timeline reads: the API must fetch the celebrity's recent tweets at read time and merge them with the pre-computed timeline.

Design decision tradeoffs

Celebrity Tweet Fan-Out

A user with 100M followers posts a tweet. The fan-out service must write to 100M timeline caches in real-time. This creates massive write amplification. How do you implement hybrid fan-out: push to active users' caches, pull for inactive followers, and prioritize high-engagement accounts?

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Kafka for Tweet Events	Kafka topics: tweet-events (new tweets, retweets, deletes), engagement-events (likes, replies, quote tweets), and timeline-refresh-events (signals to re-rank a user's home timeline).	Celebrity tweets (50M followers) cannot be fanned out synchronously in the write path — it would take too long.
Redis Timeline Cache	Timeline cache: per-user Redis sorted set of tweet IDs (scored by timestamp), capped at 800 entries.	Computing a home timeline from scratch (joining all followed accounts' tweets, sorting by time) would require a full database scan.
Fan-Out Service	The fan-out service consumes tweet events from Kafka, looks up the author's follower list (sharded in a social graph store), and writes the tweet ID into each follower's Redis timeline sorted set.	Pure push fan-out doesn't scale for celebrities: 50M writes per tweet would take hours and spike Redis write throughput.

Key design decisions

If the interviewer asks to scale 10×: 10x the load — architectural moves that work. Identify the single bottleneck (usually the database write path) and address it first before horizontal scaling.

10× Target30.0M RPSwhere your architecture must hold

What's next