Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Facebook Messenger

advancedMessagingEncryption

Large-Scale·80 min read

Facebook Messenger

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

MessagingEncryption

§1Step 2 — High-Level Design

2High-Level Design

Deliver 100B messages per day. Multi-device sync, end-to-end encryption, and offline delivery.

System architecture overview

Stage 1 of 5Starting state — the problem to solve

Progressive build — add each component step by step

Add Additional WebSocket Gateways

Add more WebSocket gateway nodes beyond the starter. Messenger shards the 1B+ concurrent connection load horizontally across many gateway servers.

What it does

WebSocket gateways maintain persistent TCP connections from Messenger clients. Each gateway handles a slice of the user population (sharded by userID). A session registry maps userID+deviceID to the gateway currently holding that socket.

Why it matters

Horizontal sharding of connections prevents any single gateway from becoming a bottleneck. When a message arrives for user A, the fan-out service looks up A's gateway in the session registry and forwards only to that server.

Trade-off

Adding gateways increases the cost of global broadcasts (e.g., system messages). Messenger avoids broadcasting by design — almost all messages are point-to-point or to a group of defined members.

Real world

Meta's Iris system replaced MQTT with a custom protocol for Messenger connections. At 1.3B users, Meta runs O(10K) gateway processes across multiple datacenters.

Capacity math

~100K connections per gateway process (each connection: ~8 KB memory). 1B concurrent connections = ~10K gateway processes. Session registry entry: ~50 bytes/connection.

In the real world: Meta's Iris system replaced MQTT with a custom protocol for Messenger connections. At 1.3B users, Meta runs O(10K) gateway processes across multiple datacenters.

Add Kafka for Message Routing

Add Kafka to decouple message receipt from delivery. The Chat API publishes messages to Kafka; the fan-out service consumes and delivers to all recipient devices.

What it does

The Chat API persists the message to the NoSQL store, then publishes to Kafka (keyed by conversationID). The fan-out service consumes from Kafka, expands group membership, and routes per-device delivery to WebSocket gateways or the push gateway.

Why it matters

Kafka decouples the write path from delivery. The Chat API acknowledges the sender as soon as the message is persisted — delivery happens asynchronously. This keeps write latency low regardless of how many recipients need to be notified.

Trade-off

Async delivery via Kafka adds 2–10ms delivery latency vs direct synchronous delivery. For Messenger's 100ms p99 target, this is acceptable. The tradeoff enables horizontal scaling of fan-out independent of the write path.

Real world

Meta's Iris messaging system uses a similar async fanout pattern. WhatsApp (also Meta) uses a different approach — direct XMPP delivery — but Messenger's scale requires the Kafka-backed model.

Capacity math

Kafka partitioned by conversationID. Fan-out message rate: billions/day. For a 500-member group chat: 500 delivery instructions per message. Kafka throughput target: 10M messages/second cluster-wide.

In the real world: Meta's Iris messaging system uses a similar async fanout pattern. WhatsApp (also Meta) uses a different approach — direct XMPP delivery — but Messenger's scale requires the Kafka-backed model.

Add Redis Tiers

Add Redis clusters for session routing (userID → gatewayID mapping) and presence state (online/offline/last seen per user).

What it does

Session cache: maps (userID, deviceID) → (gatewayID, connectionTimestamp). Updated on connect/disconnect with CAS operations to handle reconnects. Presence cache: stores last-seen timestamp and online status per user with TTL — gateway heartbeats refresh it.

Why it matters

Without a session cache, delivering a message to user A requires broadcasting to all gateways and checking if A is connected — O(gateways) per delivery. With a session cache, it's O(1): look up A's gateway, send directly.

Trade-off

Session cache entries can become stale if a gateway crashes without cleanly evicting connections. Messenger uses short TTLs (30–60s) with gateway heartbeat refresh to bound staleness. Stale entries fall back to push notification.

Real world

Meta's session registry for Messenger is a distributed Redis cluster with multi-datacenter replication. At 1B users with ~2 devices each: 2B session entries × 50 bytes = 100 GB of session state.

Capacity math

Session entry: ~50 bytes. 2B entries = 100 GB. Presence entry: ~20 bytes. Heartbeat rate: 1/30s per connected device. At 1B online: 33M heartbeats/second to Redis.

In the real world: Meta's session registry for Messenger is a distributed Redis cluster with multi-datacenter replication. At 1B users with ~2 devices each: 2B session entries × 50 bytes = 100 GB of session state.

Add Wide-Column Message Store

Add a wide-column NoSQL database (HBase/Cassandra) partitioned by conversationID to durably store all messages and serve conversation history.

What it does

Each row key = conversationID. Within a row, messages are ordered by sequence number (monotonically increasing per conversation). Columns: senderID, content, type (text/media/reaction), timestamp, read receipts. Read receipt state is stored as a sparse column per member.

Why it matters

Messages are write-heavy and read in conversation-order chunks (pagination). Wide-column stores are optimized for this access pattern: sequential reads within a partition are fast, and writes are append-mostly (new message = new column).

Trade-off

Wide-column stores trade query flexibility for write/range-scan performance. Ad-hoc queries (e.g., 'find all messages from user X in any conversation') require secondary indexes or full scans. Messenger restricts queries to conversation-scoped access.

Real world

Meta built their own HBase-based message store called ZippyDB and later Apache Hive/RocksDB derivatives. WhatsApp stores messages client-side; Messenger stores server-side for cross-device sync.

Capacity math

100B messages/day. Avg message: 500 bytes. Daily ingest: 50 TB. With 30-day retention: 1.5 PB. Partitioned across thousands of HBase RegionServers.

In the real world: Meta built their own HBase-based message store called ZippyDB and later Apache Hive/RocksDB derivatives. WhatsApp stores messages client-side; Messenger stores server-side for cross-device sync.

WebSocket Gateway Failure: ws-gateway-a crashes. 100K users lose their long-lived WebSocket connections. All clients attempt to reconnect simultaneously, causing a thundering herd on remaining gateways. How do you implement backoff-with-jitter reconnect, connection state persistence in cache-1, and load shedding on the surviving gateways?

§2Step 3 — Deep Dive

3Deep Dive

Protocol	Overhead	Battery impact	Message ordering	Best for	Cost	Ops burden
MQTT over TLS	2-byte fixed header	Very low (keep-alive ping)	Per-topic ordered	Mobile messaging, IoT, billions of devices ✓	Low	Low
WebSocket	Variable frame header	Medium (persistent TCP)	Application-level	Web chat, real-time dashboards	Low	Medium
HTTP/2 push	HTTP headers (~50B)	Low (multiplexed)	Stream-level	Web push notifications, PWA	Low	Low
Long-polling	~800B HTTP header	High (new conn per msg)	None (client-ordered)	Firewall-friendly legacy fallback	Low	Low
XMPP	XML overhead (~500B)	Medium	None built-in	Legacy enterprise IM (Jabber)	Low	Low

Mobile messaging protocols — MQTT wins for battery-constrained always-on connections.

pythonHBase message storage schema + inbox fan-out

import time
import happybase

conn = happybase.Connection('hbase-thrift:9090')
table = conn.table('messages')

def store_message(conv_id: str, msg_id: str, sender_id: str, content: bytes):
    ts = int(time.time() * 1000)
    # Reverse timestamp: newer messages have smaller row keys -> efficient range scan
    rk = f"{conv_id}#{(2**63 - ts):020d}#{msg_id}".encode()
    table.put(rk, {
        b'msg:sender_id': sender_id.encode(),
        b'msg:content':   content,
        b'msg:type':      b'text',
        b'msg:status':    b'delivered',
    })

def get_recent_messages(conv_id: str, limit: int = 50):
    prefix = f"{conv_id}#".encode()
    return list(table.scan(row_prefix=prefix, limit=limit))

# Inbox fan-out: write to each participant's inbox table
# For group chats >1000 members: async fan-out via message queue
# Read receipts: separate HBase table, updated on delivery ACK from MQTT broker

Component	Why Add It	Tradeoff
Additional WebSocket Gateways	Horizontal sharding of connections prevents any single gateway from becoming a bottleneck.	Adding gateways increases the cost of global broadcasts (e.
Kafka for Message Routing	Kafka decouples the write path from delivery.	Async delivery via Kafka adds 2–10ms delivery latency vs direct synchronous delivery.
Redis Tiers	Without a session cache, delivering a message to user A requires broadcasting to all gateways and checking if A is connected — O(gateways) per delivery.	Session cache entries can become stale if a gateway crashes without cleanly evicting connections.
Wide-Column Message Store	Messages are write-heavy and read in conversation-order chunks (pagination).	Wide-column stores trade query flexibility for write/range-scan performance.

Design decision tradeoffs

WebSocket Gateway Failure

ws-gateway-a crashes. 100K users lose their long-lived WebSocket connections. All clients attempt to reconnect simultaneously, causing a thundering herd on remaining gateways. How do you implement backoff-with-jitter reconnect, connection state persistence in cache-1, and load shedding on the surviving gateways?

Chat Store Partition

db-1 (message store) is partitioned from ws-gateway-a. Messages are sent by users but can't be persisted. How do you buffer messages in cache-1, deduplicate on recovery, and reconcile message ordering when db-1 reconnects?

Group Chat Fan-Out

A group with 250 members all send messages simultaneously during a live event. Each message requires fan-out to 249 other members' WebSocket connections. How do you implement server-side pub/sub, delivery receipts, and deferred fan-out for offline members?

Separate the long-lived connection tier from the request tier. Clients should enter through an edge gateway, then attach to multiple WebSocket gateways sharded by userID or conversation hash. Keep a dedicated session cache that maps userID and deviceID to the gateway currently holding that socket.

Never deliver large-group messages synchronously in the request path. Persist the message, append it to Kafka, and let a fan-out service expand conversation membership into device-level deliveries. Online devices are routed to the owning WebSocket gateway; offline devices fall back to the push gateway.

Multi-device sync means one user may have several active sockets plus offline devices. Use the durable message store as the source of truth partitioned by conversationID and ordered by sequence number, then let reconnecting devices pull from their last acknowledged offset while presence data and read receipts remain in a separate fast cache.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Additional WebSocket Gateways	WebSocket gateways maintain persistent TCP connections from Messenger clients.	Horizontal sharding of connections prevents any single gateway from becoming a bottleneck.
Kafka for Message Routing	The Chat API persists the message to the NoSQL store, then publishes to Kafka (keyed by conversationID).	Kafka decouples the write path from delivery.
Redis Tiers	Session cache: maps (userID, deviceID) → (gatewayID, connectionTimestamp).	Without a session cache, delivering a message to user A requires broadcasting to all gateways and checking if A is connected — O(gateways) per delivery.
Wide-Column Message Store	Each row key = conversationID.	Messages are write-heavy and read in conversation-order chunks (pagination).

Key design decisions

If the interviewer asks to scale 10×: 10x the load — architectural moves that work. Identify the single bottleneck (usually the database write path) and address it first before horizontal scaling.

10× Target10.0M RPSwhere your architecture must hold

What's next

Event-Driven Architecture

30 min read