Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Twitter/X Architecture
§1Step 2 — High-Level Design
Deliver tweets to 200M daily users in under 5 seconds. Timeline fanout, moments, and trending topics.
Add Kafka to carry tweet creation events, engagement signals, and timeline refresh triggers from the tweet API to fan-out and search indexing workers.
Kafka topics: tweet-events (new tweets, retweets, deletes), engagement-events (likes, replies, quote tweets), and timeline-refresh-events (signals to re-rank a user's home timeline). Fan-out service and search indexers both consume from tweet-events.
Celebrity tweets (50M followers) cannot be fanned out synchronously in the write path — it would take too long. Kafka allows the tweet API to return to the user immediately. Fan-out happens asynchronously at background priority.
Async fan-out means followers see celebrity tweets with a delay (seconds to minutes for very large accounts). Twitter (X) uses a hybrid model: fans of celebrities with <1M followers get pre-computed timelines; fans of mega-celebrities pull on-demand.
Twitter's notorious 'Celebrity Problem': when Obama or Lady Gaga tweeted, the fan-out service would trigger millions of Redis write operations. Twitter solved this with the hybrid push/pull model, not pure fan-out.
Tweet creation rate: ~6K tweets/second. For a 50M-follower account: 50M fan-out writes in Redis. At 1ms per write: 50,000 seconds of single-threaded work — requires massive parallelism in the fan-out service.
Add Redis clusters for home timeline caching (pre-computed tweet ID lists per user) and tweet metadata cache (tweet content for timeline hydration).
Timeline cache: per-user Redis sorted set of tweet IDs (scored by timestamp), capped at 800 entries. When a new tweet is fanned out, it's pushed to every follower's timeline sorted set. Tweet cache: tweet content (text, media URLs, author ID) keyed by tweet ID for fast hydration.
Computing a home timeline from scratch (joining all followed accounts' tweets, sorting by time) would require a full database scan. Pre-computing and maintaining a timeline cache in Redis reduces timeline load to a O(1) Redis read.
Twitter caps timeline cache at 800 tweets per user. Users who haven't opened Twitter in >800 tweets worth of time have their timeline rebuilt from the database on next visit. This cold-start rebuild is the expensive path.
Twitter's timeline service (originally called Timelines API, then Manhattan) uses Redis clusters with ~100TB of total RAM for home timelines. Each user's timeline sorted set: ~6 KB (800 tweet IDs × 8 bytes).
Timeline cache: 330M active users × 6 KB = ~2 TB. Tweet cache: 100M active tweets × 1 KB = 100 GB. Timeline read latency: <5ms (Redis GET). Fan-out write rate: 6K tweets/sec × avg 200 followers = 1.2M Redis writes/sec.
Add a dedicated fan-out service that expands a tweet event into per-follower Redis timeline writes, with special handling for high-follower accounts.
The fan-out service consumes tweet events from Kafka, looks up the author's follower list (sharded in a social graph store), and writes the tweet ID into each follower's Redis timeline sorted set. For accounts with >1M followers, fan-out is skipped — their tweets are injected at timeline read time (pull-based).
Pure push fan-out doesn't scale for celebrities: 50M writes per tweet would take hours and spike Redis write throughput. The hybrid model (push for normal accounts, pull for celebrities) bounds the fan-out work to manageable levels.
Pull-based delivery for celebrities adds latency to timeline reads: the API must fetch the celebrity's recent tweets at read time and merge them with the pre-computed timeline. This increases timeline load from <5ms (pure Redis) to ~20ms (Redis + celebrity tweet merge).
Twitter open-sourced their GraphJet real-time graph system (used for recommendations) and documented the hybrid fan-out approach in multiple engineering blog posts. The threshold for 'celebrity mode' is ~1M followers.
Normal fan-out: ~200 Redis writes per tweet (avg follower count). Celebrity threshold: 1M followers. Fan-out service workers: 100s of instances for parallelism. Target fan-out completion: <5 seconds for non-celebrity tweets.
§2Step 3 — Deep Dive
Kafka topics: tweet-events (new tweets, retweets, deletes), engagement-events (likes, replies, quote tweets), and timeline-refresh-events (signals to re-rank a user's home timeline). Fan-out service and search indexers both consume from tweet-events.
| Strategy | Write amplification | Read latency | Celebrity handling | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| Fan-out on write (push all) | O(followers) per tweet | O(1) | Thundering herd at write | Normal users (<10K followers) | High | High |
| Fan-out on read (pull all) | O(1) write | O(following) | Fine at write | Celebrity accounts (>1M followers) | Low | Low |
| Hybrid push + pull (Twitter) | O(normal followers) | O(1) + O(celebs) | Pull at read for celebs ✓ | Real mixed workloads ✓ | High | High |
| Pre-ranked timeline cache | Async ranking job | O(1) | Included in ranking | Algorithmic timelines | High | High |
| Social graph DB traversal | O(1) | O(following x tweets) | N/A | Prototype only -- too slow | Medium | High |
Tweet fan-out strategies — hybrid write/read fan-out handles celebrity accounts.
import redis
import time
r = redis.Redis()
CELEB_THRESHOLD = 1_000_000
MAX_TIMELINE_SIZE = 800
def on_new_tweet(author_id: str, tweet_id: int, follower_ids: list):
"""Fan out tweet to followers' timeline caches."""
score = time.time()
if len(follower_ids) >= CELEB_THRESHOLD:
# Celebrity: store tweet in author's own sorted set; pull at read time
r.zadd(f"tweets:{author_id}", {tweet_id: score})
return
# Normal user: push tweet_id into each follower's timeline
pipe = r.pipeline(transaction=False)
for fid in follower_ids:
pipe.zadd(f"timeline:{fid}", {tweet_id: score})
pipe.zremrangebyrank(f"timeline:{fid}", 0, -(MAX_TIMELINE_SIZE + 1))
pipe.execute()
def get_home_timeline(user_id: str, following: list, limit: int = 20) -> list:
"""Merge cached timeline with real-time celebrity tweets."""
cached_ids = [int(x) for x in r.zrevrange(f"timeline:{user_id}", 0, limit * 2)]
celeb_tweet_ids = []
for fid in following:
if get_follower_count(fid) >= CELEB_THRESHOLD:
ids = r.zrevrange(f"tweets:{fid}", 0, 5)
celeb_tweet_ids.extend(int(x) for x in ids)
all_ids = sorted(set(cached_ids + celeb_tweet_ids), reverse=True)
return all_ids[:limit]| Component | Why Add It | Tradeoff |
|---|---|---|
| Kafka for Tweet Events | Celebrity tweets (50M followers) cannot be fanned out synchronously in the write path — it would take too long. | Async fan-out means followers see celebrity tweets with a delay (seconds to minutes for very large accounts). |
| Redis Timeline Cache | Computing a home timeline from scratch (joining all followed accounts' tweets, sorting by time) would require a full database scan. | Twitter caps timeline cache at 800 tweets per user. |
| Fan-Out Service | Pure push fan-out doesn't scale for celebrities: 50M writes per tweet would take hours and spike Redis write throughput. | Pull-based delivery for celebrities adds latency to timeline reads: the API must fetch the celebrity's recent tweets at read time and merge them with the pre-computed timeline. |
Design decision tradeoffs
A user with 100M followers posts a tweet. The fan-out service must write to 100M timeline caches in real-time. This creates massive write amplification. How do you implement hybrid fan-out: push to active users' caches, pull for inactive followers, and prioritize high-engagement accounts?
A live event (World Cup final) causes 100K tweets/second on a single hashtag. Write throughput to timeline-api saturates. How do you implement rate limiting per topic, write batching, and async fan-out queues to absorb the spike?
A network partition isolates the timeline cache cluster from tweet-api. New tweets can't update timelines. How do you implement read-your-writes consistency for the author, serve stale timelines for others, and reconcile after the partition heals?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Kafka for Tweet Events | Kafka topics: tweet-events (new tweets, retweets, deletes), engagement-events (likes, replies, quote tweets), and timeline-refresh-events (signals to re-rank a user's home timeline). | Celebrity tweets (50M followers) cannot be fanned out synchronously in the write path — it would take too long. |
| Redis Timeline Cache | Timeline cache: per-user Redis sorted set of tweet IDs (scored by timestamp), capped at 800 entries. | Computing a home timeline from scratch (joining all followed accounts' tweets, sorting by time) would require a full database scan. |
| Fan-Out Service | The fan-out service consumes tweet events from Kafka, looks up the author's follower list (sharded in a social graph store), and writes the tweet ID into each follower's Redis timeline sorted set. | Pure push fan-out doesn't scale for celebrities: 50M writes per tweet would take hours and spike Redis write throughput. |
Key design decisions