Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Social Media Feed

intermediateFan-outCaching

Social·60 min read

Social Media Feed

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

Fan-outCaching

§1Step 2 — High-Level Design

2High-Level Design

Build a news feed like Twitter/X. Fan-out on write vs read. Handle celebrities with 100M followers.

System architecture overview

Stage 1 of 4Starting state — the problem to solve

Progressive build — add each component step by step

Add Redis for Pre-Computed Timelines

Store each user's feed as a Redis sorted set (score = timestamp). Feed reads are a single ZREVRANGE call — no database join required.

What it does

Each user has a Redis sorted set of post IDs (scored by timestamp). Feed read = ZREVRANGE of their timeline key. Fan-out write = ZADD to each follower's timeline.

Why it matters

Building a feed from the database requires expensive JOINs across posts, follows, and ranking tables. Pre-computed Redis timelines serve feeds at < 1ms.

Trade-off

Fan-out-on-write has write amplification: 1 post by someone with 1M followers = 1M Redis writes. This is fine for normal users but catastrophic for celebrities.

Real world

Twitter's original architecture used Redis sorted sets for pre-computed timelines. Instagram uses Cassandra for timeline storage at 60M reads/day.

Capacity math

1 timeline entry = ~40 bytes (post ID + timestamp + metadata). 100 posts per user × 100M users = 400GB Redis storage. Shard by user ID across 8-10 Redis nodes.

In the real world: Twitter's original architecture used Redis sorted sets for pre-computed timelines. Instagram uses Cassandra for timeline storage at 60M reads/day.

Add Kafka for Async Fan-Out Workers

Post creation publishes to Kafka. Fan-out worker service consumes events and writes to follower timelines asynchronously — decoupling post latency from follower count.

What it does

Write API publishes PostCreated to Kafka. Fan-out workers consume the event and write to up to N follower timelines in Redis. Workers scale independently of the write API.

Why it matters

Synchronous fan-out makes post latency proportional to follower count. A celeb with 10M followers would cause a 10M-write synchronous operation — unacceptable.

Trade-off

Async fan-out means followers see posts with a short delay (seconds). Most users accept < 10s delivery lag in exchange for consistent low-latency post creation.

Real world

Twitter switched to async fan-out via a delivery pipeline. Facebook uses Iris (a real-time notification system) built on Kafka for async feed updates.

Capacity math

Kafka fan-out at 1M posts/hour × average 500 followers = 500M timeline writes/hour. With 100 fan-out worker partitions = 5M writes/hour per worker — very manageable.

In the real world: Twitter switched to async fan-out via a delivery pipeline. Facebook uses Iris (a real-time notification system) built on Kafka for async feed updates.

Add Object Storage for Media

Store post images and videos in object storage. Posts in Postgres reference media by URL — actual bytes served via CDN without touching your API servers.

What it does

Photos and videos are uploaded directly to S3 using pre-signed URLs. Post metadata (caption, user, timestamp, media URLs) is stored in Postgres. CDN serves media at the edge.

Why it matters

At 100M DAU posting 1 photo each, that's 100M images/day. Routing uploads through API servers saturates network bandwidth. Pre-signed URLs bypass the API entirely.

Trade-off

Pre-signed URLs require an expiry time. For private content, short TTLs (15 minutes) add re-authentication overhead. Public content can use permanent CDN URLs.

Real world

Instagram stores all media in S3, serving through a custom CDN with 99%+ cache hit rates. TikTok uses distributed object storage across multiple clouds for 500M daily video uploads.

Capacity math

S3 scales to exabytes. At 100M images/day × 2MB average = 200TB/day of new storage. CDN cache hit rates of 95%+ mean origin storage serves < 5% of reads.

In the real world: Instagram stores all media in S3, serving through a custom CDN with 99%+ cache hit rates. TikTok uses distributed object storage across multiple clouds for 500M daily video uploads.

Posts DB Failure: Postgres goes down. Users with pre-computed Redis timelines can still read feeds. What percentage of users are affected?

§2Step 3 — Deep Dive

3Deep Dive

Each user has a Redis sorted set of post IDs (scored by timestamp). Feed read = ZREVRANGE of their timeline key. Fan-out write = ZADD to each follower's timeline.

Strategy	Write cost	Read cost	Celebrity problem	Best for	Cost	Ops burden
Fan-out on write	O(followers)	O(1) Redis read	Write amplification	< 10K followers/user ✓	Medium	Medium
Fan-out on read	O(1)	O(following) joins	None	Celebrity-heavy platforms	Low	Low
Hybrid (write normal, read celeb)	O(normal followers)	O(celeb following)	Solved	Twitter/Instagram scale ✓	Medium	High
Ranked feed service	O(followers)	O(1) with ML rank	Write amplification	Personalized ranking	High	High
Pull-based aggregation	O(1)	O(following)	None	Low-frequency, small scale	Low	Medium

Feed generation strategies — hybrid wins for celebrity-heavy platforms.

typescriptSocial Feed — fan-out-on-write with Redis sorted sets

import { createClient } from 'redis'
import { Kafka } from 'kafkajs'

const redis = createClient()
const kafka = new Kafka({ brokers: ['kafka:9092'] })

const CELEBRITY_THRESHOLD = 1_000_000  // 1M followers = celebrity

export async function getFeed(userId: string, limit = 20): Promise<string[]> {
  const timelineKey = `timeline:${userId}`

  // 1. Read pre-computed timeline from Redis sorted set (newest first)
  const postIds = await redis.zRange(timelineKey, 0, limit - 1, { REV: true })

  // 2. For users followed celebrities, merge their recent posts at read time
  const following = await getCelebrityFollows(userId)
  if (following.length > 0) {
    const celebPosts = await fetchRecentPosts(following, limit)
    return mergeSortedByTime([...postIds, ...celebPosts]).slice(0, limit)
  }

  return postIds
}

// Fan-out worker: consumes PostCreated events from Kafka
export async function fanOutPost(authorId: string, postId: string, timestamp: number): Promise<void> {
  const followerCount = await getFollowerCount(authorId)

  // Skip fan-out for celebrities — their posts are fetched at read time
  if (followerCount >= CELEBRITY_THRESHOLD) return

  const followers = await getFollowers(authorId)
  const pipeline = redis.multi()

  for (const followerId of followers) {
    const key = `timeline:${followerId}`
    pipeline.zAdd(key, { score: timestamp, value: postId })
    // Keep only last 1000 posts per timeline
    pipeline.zRemRangeByRank(key, 0, -1001)
  }

  await pipeline.exec()
}

declare function getCelebrityFollows(userId: string): Promise<string[]>
declare function fetchRecentPosts(userIds: string[], limit: number): Promise<string[]>
declare function mergeSortedByTime(postIds: string[]): string[]
declare function getFollowerCount(userId: string): Promise<number>
declare function getFollowers(userId: string): Promise<string[]>

Component	Why Add It	Tradeoff
Redis for Pre-Computed Timelines	Building a feed from the database requires expensive JOINs across posts, follows, and ranking tables.	Fan-out-on-write has write amplification: 1 post by someone with 1M followers = 1M Redis writes.
Kafka for Async Fan-Out Workers	Synchronous fan-out makes post latency proportional to follower count.	Async fan-out means followers see posts with a short delay (seconds).
Object Storage for Media	At 100M DAU posting 1 photo each, that's 100M images/day.	Pre-signed URLs require an expiry time.

Design decision tradeoffs

Posts DB Failure

Postgres goes down. Users with pre-computed Redis timelines can still read feeds. What percentage of users are affected?

Feed API Server Crash

api-1 crashes. Users connected to it see feed loading errors. How do you implement health checks, session affinity, and graceful shutdown so existing connections drain and new requests route to api-2?

Celebrity Post Fan-Out

A celebrity with 50M followers posts a photo. The fan-out service must write to 50M follower feeds. This saturates the write pipeline and delays other users' feeds by minutes. How do you implement lazy fan-out, hybrid push/pull, and priority queuing for celebrity accounts?

Pre-compute timelines: when a user posts, fan-out to all followers' Redis timeline lists (sorted set). Feed reads are then O(1) — just read from Redis. Works for users with < 1M followers.

For celebrity users (10M+ followers), skip fan-out-on-write. Instead, merge their recent posts into the timeline at read time. The feed API does: read Redis timeline + fetch celebrity posts + merge and rank.

Use Kafka to make fan-out async: Write API publishes a PostCreated event to Kafka. Fan-out workers consume events and write to followers' Redis timelines in the background — decoupled from the post write.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Redis for Pre-Computed Timelines	Each user has a Redis sorted set of post IDs (scored by timestamp).	Building a feed from the database requires expensive JOINs across posts, follows, and ranking tables.
Kafka for Async Fan-Out Workers	Write API publishes PostCreated to Kafka.	Synchronous fan-out makes post latency proportional to follower count.
Object Storage for Media	Photos and videos are uploaded directly to S3 using pre-signed URLs.	At 100M DAU posting 1 photo each, that's 100M images/day.

Key design decisions

If the interviewer asks to scale 10×: 10x the load — architectural moves that work. Identify the single bottleneck (usually the database write path) and address it first before horizontal scaling.

10× Target50K RPSwhere your architecture must hold

What's next