Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Discord Real-Time
§1Step 2 — High-Level Design
Power 500M users across voice, video, and text. WebRTC at scale, server regions, and bot infrastructure.
Add more WebSocket gateway shards beyond the starter. Discord shards connections by guild ID so each gateway only handles a subset of servers.
WebSocket gateway shards maintain persistent connections from Discord clients. Sharding by guild ID means all members of a guild connect to the same shard, simplifying event fan-out (presence updates, messages) within a guild.
A single WebSocket server cannot maintain millions of persistent connections. Sharding distributes the connection load. Guild-aware sharding also means the gateway can fan-out events to guild members without a distributed lookup — all their connections are local.
Guild-based sharding causes hotspots for very large guilds (Discord's largest have 800K members). Discord handles this with large-guild isolation: huge guilds get their own dedicated shard pool.
Discord runs 100K+ shard processes globally. When a shard crashes, all members of the affected guilds see temporary disconnects. Discord's client uses exponential backoff reconnect to avoid thundering herd.
~2500 guilds per shard (Discord's published sharding factor). At 19M active servers: ~7600 shards. Each shard: ~64K concurrent connections at peak.
Add Kafka to asynchronously route messages, presence events, and voice state updates between gateway shards and backend services.
Kafka partitions events by guild ID. Gateway shards publish message events, presence updates, and typing indicators to Kafka. Fan-out services consume from Kafka and push to the appropriate shard(s) for delivery.
Without Kafka, gateway shards would need to directly coordinate with each other — requiring all-to-all communication as shard count grows. Kafka provides a central bus that decouples producers from consumers and provides durability.
Kafka adds latency (~2–5ms) vs direct shard-to-shard messaging. Discord accepts this latency for durability and operational simplicity, keeping Kafka for message routing and using Redis Pub/Sub for low-latency presence.
Discord processes 4 billion minutes of video per day and millions of messages per second through Kafka pipelines. They use separate Kafka clusters for messages vs analytics.
Discord's Kafka: millions of events/second. Message latency via Kafka: <10ms p99. Kafka cluster: 60+ brokers at peak.
Add separate Redis tiers for session routing (which shard holds a user's connection) and for fast presence state (online/idle/dnd/offline).
Session cache: maps userID → shardID so the fan-out service can route delivery to the correct WebSocket gateway. Presence cache: stores each user's online status with short TTL — gateways heartbeat to refresh; expiry signals disconnect.
When shard A receives a message for a guild, it must fan-out to all online guild members — who may be on any shard. Without a session cache, delivery requires broadcasting to all shards. With it, only the owning shard receives the event.
Presence updates are extremely high volume (heartbeat every 30s from every online user). Discord uses Redis Cluster with read replicas to handle the read fan-out from guild presence aggregation.
Discord's 'Friends List' presence runs on Redis. At peak, 10M+ users are online simultaneously, generating 350K presence updates/second across the Redis cluster.
Session cache: ~100 bytes/user × 10M online users = 1 GB. Presence TTL: 60 seconds. Redis Pub/Sub throughput: 1M+ messages/second per node.
Add a dedicated fan-out service that expands a single message event into per-shard delivery instructions for all online guild members.
The fan-out service consumes from Kafka, looks up guild membership and online status from the presence cache, groups recipients by shard, and publishes per-shard delivery batches. Each shard receives only the events for its connected users.
Fan-out is O(guild size) work. For large guilds, doing this synchronously in the gateway would cause write latency to spike. An async fan-out service isolates this work and can scale horizontally independent of gateways.
Large guild fan-out is inherently expensive. Discord limits some event types (like full presence lists) for guilds >250 members, switching to on-demand presence fetching rather than push-based fan-out.
Discord's 'large guild' feature flag changes delivery semantics for servers >250 members. Events that would require 250K+ fan-out operations are rate-limited or replaced with summarized payloads.
Fan-out workers: horizontally scalable. For 800K member guild: 800K delivery instructions per message. Fan-out rate: ~100K events/second per worker process.
§2Step 3 — Deep Dive
WebSocket gateway shards maintain persistent connections from Discord clients. Sharding by guild ID means all members of a guild connect to the same shard, simplifying event fan-out (presence updates, messages) within a guild.
| Protocol | Direction | Latency | Connection overhead | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| WebSocket | Full-duplex | ~5ms | One TCP conn per client | Chat, gaming, live collab ✓ | Low | Medium |
| Server-Sent Events (SSE) | Server -> Client only | ~10ms | One HTTP conn | Dashboards, notifications (read-only) | Low | Low |
| Long-polling | Request-response | ~100ms | New HTTP req per message | Legacy fallback, firewall-friendly | Low | Low |
| MQTT | Pub/sub, full-duplex | ~2ms | Lightweight 2-byte header | IoT, mobile (battery), billions of devices | Low | Low |
| gRPC streaming | Full-duplex | ~3ms | HTTP/2 multiplexed | Internal services, typed proto contracts | Low | Medium |
Real-time message delivery — WebSocket wins for full-duplex persistent connections.
import WebSocket from 'ws'
import Redis from 'ioredis'
const pub = new Redis()
const sub = new Redis()
// In-memory map: channelId -> Set of live WebSocket connections
const subscribers = new Map<string, Set<WebSocket>>()
// Subscribe to Redis pub/sub for cross-server fan-out
sub.subscribe('discord:messages')
sub.on('message', (_channel, raw) => {
const { channelId, payload } = JSON.parse(raw)
const conns = subscribers.get(channelId)
if (!conns) return
const frame = JSON.stringify(payload)
for (const ws of conns) {
if (ws.readyState === WebSocket.OPEN) {
ws.send(frame)
} else {
conns.delete(ws) // lazy cleanup of dead connections
}
}
})
// When a user sends a message
async function dispatch(channelId: string, message: object) {
const payload = { ...message, ts: Date.now() }
// Publish to all gateway servers -- each fans out to its local subscribers
await pub.publish('discord:messages', JSON.stringify({ channelId, payload }))
}
// ~500K WebSocket connections per gateway server
// Cassandra stores messages durably; Redis is ephemeral delivery only| Component | Why Add It | Tradeoff |
|---|---|---|
| Additional WebSocket Gateway Shards | A single WebSocket server cannot maintain millions of persistent connections. | Guild-based sharding causes hotspots for very large guilds (Discord's largest have 800K members). |
| Kafka Message Queue | Without Kafka, gateway shards would need to directly coordinate with each other — requiring all-to-all communication as shard count grows. | Kafka adds latency (~2–5ms) vs direct shard-to-shard messaging. |
| Redis Tiers | When shard A receives a message for a guild, it must fan-out to all online guild members — who may be on any shard. | Presence updates are extremely high volume (heartbeat every 30s from every online user). |
| Fan-Out Service | Fan-out is O(guild size) work. | Large guild fan-out is inherently expensive. |
Design decision tradeoffs
gateway-shard-a crashes with 50K connected WebSocket clients. All those clients lose their connections and must reconnect. If all 50K reconnect simultaneously, they overwhelm edge-gateway and create a thundering herd. How do you implement staggered reconnect with jitter, session resumption, and connection draining before planned failures?
A network partition isolates cache-1 from the gateway shards. Presence updates can't be written, so all users appear offline to others. How do you implement a local presence cache on each gateway shard, serve stale presence data, and reconcile when connectivity resumes?
A Discord server with 500K members all online receives a single message. The gateway must fan out that event to all 500K concurrent WebSocket connections simultaneously. How do you implement pub/sub with efficient delivery trees to avoid saturating any single gateway shard?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Additional WebSocket Gateway Shards | WebSocket gateway shards maintain persistent connections from Discord clients. | A single WebSocket server cannot maintain millions of persistent connections. |
| Kafka Message Queue | Kafka partitions events by guild ID. | Without Kafka, gateway shards would need to directly coordinate with each other — requiring all-to-all communication as shard count grows. |
| Redis Tiers | Session cache: maps userID → shardID so the fan-out service can route delivery to the correct WebSocket gateway. | When shard A receives a message for a guild, it must fan-out to all online guild members — who may be on any shard. |
| Fan-Out Service | The fan-out service consumes from Kafka, looks up guild membership and online status from the presence cache, groups recipients by shard, and publishes per-shard delivery batches. | Fan-out is O(guild size) work. |
Key design decisions