Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Instagram Feed
§1Step 2 — High-Level Design
Rank 100M photos per second for each user. Story sequencing, explore feed, and creator monetization.
Add an edge API gateway to split and route feed, Stories, and Explore traffic to their respective backend services.
The API gateway authenticates requests (JWT/session token), applies rate limits, and routes to the correct backend: Feed API for home feed, Stories API for Stories ring, Explore API for discovery. It also handles compression and protocol negotiation.
Without a gateway, each backend service would need to implement auth and rate limiting independently. The gateway centralizes cross-cutting concerns and provides a stable client-facing API even as backend services evolve.
The gateway is in the critical path for every request. It must be highly available (multiple instances behind an LB) and fast (<5ms processing overhead). Instagram uses an Nginx-based gateway with custom modules for Instagram-specific logic.
Instagram runs on a custom Python (Django) stack fronted by nginx. The nginx layer handles SSL termination, rate limiting, and static content. Django handles dynamic API responses. At 4M RPS, this requires thousands of API servers.
Gateway overhead: <5ms. Rate limit: 200 requests/hour per user for sensitive endpoints. At 4M RPS, gateway tier needs ~400 nginx processes at 10K RPS each.
Add Kafka to carry post creation events, engagement signals (likes, comments, saves), and Stories publish events to ranking workers.
Kafka topics: post-events (new posts from followed accounts), engagement-events (likes, comments, shares, saves per post), story-events (Stories publishes and expirations), and explore-signals (search queries, video completions). Ranking workers consume from all engagement topics.
Engagement signals are the training data for Instagram's ranking model and the real-time features for the serving model. Publishing to Kafka decouples the API tier (which receives these signals) from the ranking tier (which processes them).
Kafka adds 2–5ms to signal propagation. For the ranking model's real-time features (recency of engagement), this latency is acceptable. The model is retrained daily; real-time features update every few minutes from the Kafka stream.
Instagram processes billions of engagement events per day. Their ML ranking pipeline runs on a combination of Kafka (streaming) and Hadoop/Spark (batch). The ranking model serves 4M feed loads per second using precomputed candidates.
Kafka topics: post-events ~10M events/day, engagement-events ~100B events/day (likes + views). Ranking model: inference <10ms per candidate. Feed generation: 200 candidates scored per feed load.
Add separate Redis clusters for ranked feed candidates, Stories state, and Explore candidates — each with different TTL and update patterns.
Feed cache: ranked list of post IDs per user, refreshed asynchronously by ranking workers. Stories cache: active story IDs per user with expiry timestamps; updated on publish and garbage-collected by expiry workers. Explore cache: pre-computed candidate pools per interest cluster.
Scoring 200 candidates from the ranking model takes 50–200ms. At 4M RPS, doing this synchronously per request would require millions of GPU-hours. Precomputing ranked feeds into Redis and serving them with O(1) reads enables the p99 < 500ms SLO.
Precomputed feeds are stale by definition — ranking workers refresh them every few minutes, not on every engagement event. Users may see slightly stale rankings. Instagram handles this by mixing precomputed candidates with a small set of real-time candidates fetched at serve time.
Instagram's feed ranking pipeline writes precomputed feeds to Memcached (not Redis). Their architecture uses both Memcached (for large feed blobs) and TAO (Facebook's graph cache) for social graph data.
Feed cache entry: ~2 KB (200 post IDs). At 1B users × 2 KB = 2 TB cache. Refresh frequency: every 10–30 minutes per user. Stories cache: 10M active stories × 100 bytes = 1 GB.
§2Step 3 — Deep Dive
The API gateway authenticates requests (JWT/session token), applies rate limits, and routes to the correct backend: Feed API for home feed, Stories API for Stories ring, Explore API for discovery. It also handles compression and protocol negotiation.
| Strategy | Write cost | Read cost | Freshness | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| Fan-out on write (push) | O(followers) per post | O(1) read | Immediate | Users with <10K followers ✓ | Medium | Medium |
| Fan-out on read (pull) | O(1) write | O(following) per read | Real-time | Celebrity accounts (>1M followers) ✓ | Low | Low |
| Hybrid push+pull (Instagram) | O(followers) for normal | O(1) + O(celebs) mix | Near-real-time | Realistic mixed workload ✓ | Medium | High |
| Pre-computed ranked feed | Async ranking job | O(1) | Seconds delay | Algorithmic feeds, personalized ranking | High | High |
| Social graph traversal at read | O(1) write | O(following x posts) | Real-time | Small social graphs, prototype only | Low | High |
Feed generation strategies — hybrid push/pull wins for mixed follower counts.
import redis
import time
r = redis.Redis()
CELEBRITY_THRESHOLD = 10_000
MAX_TIMELINE_SIZE = 500
def post_created(user_id: str, post_id: str, follower_ids: list):
"""Fan out a new post to followers' feed caches."""
if len(follower_ids) > CELEBRITY_THRESHOLD:
# Celebrity: skip push, followers will pull at read time
r.zadd(f"posts:{user_id}", {post_id: time.time()})
return
# Normal user: push post_id into each follower's feed sorted set
pipe = r.pipeline(transaction=False)
for fid in follower_ids:
pipe.zadd(f"feed:{fid}", {post_id: time.time()})
pipe.zremrangebyrank(f"feed:{fid}", 0, -(MAX_TIMELINE_SIZE + 1))
pipe.execute()
def get_feed(user_id: str, following: list, limit: int = 20) -> list:
"""Merge pre-computed feed with real-time celebrity posts."""
cached = r.zrevrange(f"feed:{user_id}", 0, limit - 1)
celeb_posts = []
celebs = [f for f in following if get_follower_count(f) > CELEBRITY_THRESHOLD]
for celeb_id in celebs:
posts = r.zrevrange(f"posts:{celeb_id}", 0, 10)
celeb_posts.extend(posts)
all_posts = list(set(cached + celeb_posts))
return sorted(all_posts, key=get_post_ts, reverse=True)[:limit]| Component | Why Add It | Tradeoff |
|---|---|---|
| API Gateway | Without a gateway, each backend service would need to implement auth and rate limiting independently. | The gateway is in the critical path for every request. |
| Kafka for Post Events | Engagement signals are the training data for Instagram's ranking model and the real-time features for the serving model. | Kafka adds 2–5ms to signal propagation. |
| Redis Cache Tiers | Scoring 200 candidates from the ranking model takes 50–200ms. | Precomputed feeds are stale by definition — ranking workers refresh them every few minutes, not on every engagement event. |
Design decision tradeoffs
Network partition isolates one Redis cache tier. Feed and Stories queries to the isolated cache will timeout. The system must degrade gracefully: feed rankings fall back to database queries (slower), Stories fall back to checking the primary metadata store.
A viral post from a major creator (100M followers) suddenly receives 10M engagements in 60 seconds. The ranking workers cannot keep up with signal processing; the feed cache becomes stale. New rankings are delayed by 5-10 minutes. Users see outdated feed candidates. Mitigation: rank post updates asynchronously in batches; use exponential backoff for cache refreshes.
A Stories expiry worker crashes; 10M Stories from the previous 24 hours accumulate in Kafka without being garbage-collected. Stories cache grows to 5GB. The Kafka consumer group falls behind by 30+ minutes. Users see expired Stories in their Stories ring. Recovery: restart the expiry worker; the backlog drains over 2-3 minutes.
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| API Gateway | The API gateway authenticates requests (JWT/session token), applies rate limits, and routes to the correct backend: Feed API for home feed, Stories API for Stories ring, Explore API for discovery. | Without a gateway, each backend service would need to implement auth and rate limiting independently. |
| Kafka for Post Events | Kafka topics: post-events (new posts from followed accounts), engagement-events (likes, comments, shares, saves per post), story-events (Stories publishes and expirations), and explore-signals (search queries, video completions). | Engagement signals are the training data for Instagram's ranking model and the real-time features for the serving model. |
| Redis Cache Tiers | Feed cache: ranked list of post IDs per user, refreshed asynchronously by ranking workers. | Scoring 200 candidates from the ranking model takes 50–200ms. |
Key design decisions