Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Real-Time Chat System

intermediateWebSocketMessaging

Realtime·50 min read

Real-Time Chat System

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

WebSocketMessaging

§1Step 2 — High-Level Design

2High-Level Design

Build a chat system supporting 50M concurrent users. WebSockets, presence, and message ordering.

System architecture overview

Stage 1 of 4Starting state — the problem to solve

Progressive build — add each component step by step

Add Redis Pub/Sub for Cross-Gateway Fan-out

Connect Redis to both WebSocket gateways. Messages published to a room channel are delivered to all subscribers on any gateway.

What it does

Redis pub/sub is a lightweight messaging layer where clients (WebSocket gateways) subscribe to named channels (one per chat room) and publish messages to them. When WS Gateway 1 receives a user message, it publishes to 'room:room-id:messages'. Both WS Gateway 1 and WS Gateway 2 are subscribed to the same channel and receive the published message, allowing them to fan-out to their respective connected clients. This creates a shared message bus across all WebSocket servers without requiring direct client-to-client communication. Redis maintains subscription state in memory, making pub/sub operations extremely fast (sub-millisecond), but messages are not persisted—only clients actively subscribed at publish time receive them.

Why it matters

At scale, hundreds of WebSocket servers handle different clients. Without a shared message bus, users in the same room but on different servers would miss each other's messages. Each gateway would only communicate with its local clients, breaking the illusion of a unified chat room. Redis pub/sub solves this by acting as a central fanout hub: one publish reaches all subscribed gateways simultaneously, and each gateway delivers to its connected users. This architecture allows linear scaling of client connections across multiple servers while keeping message latency low.

Trade-off

Redis pub/sub has no persistence — messages are lost if no subscriber is connected at publish time. If a client is offline and a message arrives, they won't see it in pub/sub (only in message history if stored separately). This is acceptable for live chat (users expect to miss messages when offline) but problematic for critical notifications. Additionally, pub/sub creates memory overhead on Redis (subscription state) and network overhead if a single room has users across 100+ gateways (Redis broadcasts to all subscribers). For extremely large fan-outs (>10K subscribers per channel), single-channel architecture can become a bottleneck.

Real world

Discord uses Redis pub/sub for real-time message fan-out across their WebSocket servers. Each Discord guild (server) has a Redis channel, and all connected gateways subscribe to channels their users have open. Slack uses a similar model with additional deduplication to handle retries. WhatsApp routes through a custom pub/sub system built on their erlang infrastructure for handling billions of messages daily. The pattern is proven for 1M+ concurrent users as long as hotspots are partitioned.

Capacity math

Redis pub/sub handles 1M+ messages/second per node with sub-millisecond latency. A single channel can sustain 10K+ simultaneous subscribers without performance degradation. Fan-out to 1000 subscribers takes <1ms. Memory overhead is minimal (~100 bytes per subscription). Network overhead for publish is O(1)—Redis sends one publish command regardless of subscriber count, broadcasting internally.

In the real world: Discord uses Redis pub/sub for real-time message fan-out across their WebSocket servers. Each Discord guild (server) has a Redis channel, and all connected gateways subscribe to channels their users have open. Slack uses a similar model with additional deduplication to handle retries. WhatsApp routes through a custom pub/sub system built on their erlang infrastructure for handling billions of messages daily. The pattern is proven for 1M+ concurrent users as long as hotspots are partitioned.

Add Kafka for Durable Message Queue

Route messages through Kafka before Postgres. Kafka buffers write spikes, enables replay for offline users, and decouples WS gateways from the database.

What it does

Kafka acts as a buffering layer between WebSocket gateways and the persistent message store. When a user sends a message, the WS gateway simultaneously publishes to Redis pub/sub (for real-time delivery) and to a Kafka topic (for durability). Kafka persists the message to disk across multiple brokers before acknowledging the write, ensuring no loss even if nodes fail. A separate set of consumer processes reads from the Kafka topic and writes to Postgres asynchronously. This dual-path architecture decouples the fast path (pub/sub for live delivery) from the slow path (database writes for history). Messages are available immediately to online users via Redis, while persistence happens in the background.

Why it matters

Postgres writes are synchronous and slow (5-50ms depending on storage). When message volume spikes (e.g., 100K messages/second during an event), direct writes to Postgres cause database contention and p99 latency to spike to seconds. Kafka absorbs the burst: writes to Kafka are fast (1-5ms) because Kafka uses sequential disk I/O and batching. The database consumers then persist messages at a steady rate, preventing write storms. This also enables offline message replay—consumers can read Kafka history and construct a user's message history on demand, and new users joining a room can replay recent messages without querying the full database.

Trade-off

Kafka adds operational complexity (cluster management, topic/partition configuration) and introduces eventual consistency—there's a window where a message is in Redis but not yet in Postgres. If a consumer crashes, messages may be reprocessed and persisted twice (requiring deduplication on read). Additionally, Kafka consumes disk space (messages are retained for X days) and network bandwidth (replication across brokers). For low-volume systems (<1K messages/second), Kafka overhead exceeds benefit; PostgreSQL alone may be sufficient.

Real world

WhatsApp and Facebook Messenger use Kafka extensively for message durability. Discord uses Cassandra (instead of Postgres) accessed through Kafka consumers to handle their 4B message/day volume. Slack uses a similar pattern with message queuing before database persistence. LinkedIn's messaging platform processes billions of messages daily through Kafka. The pattern is industry-standard for any chat system at scale.

Capacity math

Kafka handles millions of messages/second per partition with sub-5ms latency. A single 3-broker cluster can sustain 500MB/s throughput with replication. Messages are retained on disk for 7-30 days depending on configuration. Consumers can replay history at full line rate, and multiple consumer groups can read the same topic independently without interference.

In the real world: WhatsApp and Facebook Messenger use Kafka extensively for message durability. Discord uses Cassandra (instead of Postgres) accessed through Kafka consumers to handle their 4B message/day volume. Slack uses a similar pattern with message queuing before database persistence. LinkedIn's messaging platform processes billions of messages daily through Kafka. The pattern is industry-standard for any chat system at scale.

Add a Presence Service

Add a presence service (Redis-backed) to track online/offline status and route notifications to push gateways for offline users.

What it does

A presence service maintains a real-time map of user ID to WebSocket server assignment and connection state. When a user opens a WebSocket connection to WS Gateway 1, the gateway stores 'user:123 → ws-1:socket-456' in the presence service (typically Redis hash maps). The service also tracks online/offline status and last-seen timestamp. Before fan-out, the gateway checks presence: if a recipient is online (present in the map), deliver via WebSocket to their gateway; if offline, route to a push notification service (APNs/FCM). This enables targeted delivery—real-time for online users, push notifications for offline users.

Why it matters

Fan-out to offline users is wasteful. If you pub/sub a message to a room with 1M users but 50% are offline, you're wasting 500K publish operations to users who won't read the message. Presence-aware routing optimizes fan-out: only deliver via WebSocket to online users (on-demand, high priority) and queue push notifications for offline users separately (lower priority, batched). This also solves the problem of stale subscriptions—if a user closes their app but the WebSocket doesn't cleanly disconnect, the gateway can check presence to detect they're truly offline and not send them messages.

Trade-off

Presence state is eventually consistent. A user may appear online for a few seconds after actually disconnecting (the heartbeat hasn't fired yet). Heartbeat intervals (typically 30-60 seconds) control the lag—shorter intervals reduce stale presence but increase Redis write load. Additionally, presence lookups before every message add latency (1-2ms per lookup) unless you cache presence locally on each gateway. The service itself requires careful handling of connection/disconnection events to avoid stale entries.

Real world

WhatsApp's presence system uses Erlang processes tracking 2B+ users globally, with heartbeats every 30-60 seconds. Discord uses a distributed presence system with 8.5M concurrent voice/video connections tracked in real-time. Slack's presence service powers their status indicators and notification routing. Telegram uses presence for their 'last seen' timestamps. Presence is essential for modern chat systems.

Capacity math

Redis can track 1M active WebSocket sessions with hash maps at <100MB memory (each entry is ~100 bytes). Heartbeat writes at 30-second intervals produce ~33K writes/second per 1M users. Presence lookups are O(1) and take <1ms. Batch heartbeats (send all user heartbeats in a single pipelined command) to reduce round-trips.

In the real world: WhatsApp's presence system uses Erlang processes tracking 2B+ users globally, with heartbeats every 30-60 seconds. Discord uses a distributed presence system with 8.5M concurrent voice/video connections tracked in real-time. Slack's presence service powers their status indicators and notification routing. Telegram uses presence for their 'last seen' timestamps. Presence is essential for modern chat systems.

WebSocket Gateway Failure: WS Gateway 1 crashes. Clients on it are disconnected — how does the system reconnect them to WS Gateway 2 and resume delivery?

§2Step 3 — Deep Dive

3Deep Dive

Approach	Latency	Persistence	Fan-out	Best for	Cost	Ops burden
Long polling	500ms–2s	Yes (DB)	Poor	Legacy browsers, simple cases	Low	Low
Server-Sent Events	< 100ms	Yes	Server→Client only	Notifications, feeds	Low	Low
WebSocket + Redis pub/sub	< 50ms	Via Kafka	Excellent	Chat, live collaboration ✓	Medium	Medium
WebSocket + Kafka direct	< 100ms	Yes	Good	High-durability chat	High	Medium
gRPC streaming	< 20ms	Manual	Good	Internal microservices	Low	Medium

Real-time message delivery strategies — WebSocket + Redis pub/sub is standard.

typescriptReal-Time Chat — WebSocket gateway with Redis pub/sub fan-out

import { WebSocketServer, WebSocket } from 'ws'
import { createClient } from 'redis'

const wss = new WebSocketServer({ port: 8080 })
const publisher = createClient()
const subscriber = createClient()
await publisher.connect()
await subscriber.connect()

// Map room → set of connected WebSocket clients (local to this gateway)
const rooms = new Map<string, Set<WebSocket>>()

wss.on('connection', (ws) => {
  let currentRoom: string | null = null

  ws.on('message', async (data) => {
    const msg = JSON.parse(data.toString())

    if (msg.type === 'join') {
      currentRoom = msg.room
      if (!rooms.has(currentRoom)) rooms.set(currentRoom, new Set())
      rooms.get(currentRoom)!.add(ws)

      // Subscribe to room channel — delivers messages from other gateways
      await subscriber.subscribe(`room:${currentRoom}`, (message) => {
        rooms.get(currentRoom!)?.forEach(client => {
          if (client.readyState === WebSocket.OPEN) client.send(message)
        })
      })
    }

    if (msg.type === 'message' && currentRoom) {
      const payload = JSON.stringify({ room: currentRoom, text: msg.text, ts: Date.now() })
      // Publish to Redis — all gateways subscribed to this room will fan out
      await publisher.publish(`room:${currentRoom}`, payload)
    }
  })

  ws.on('close', () => {
    if (currentRoom) rooms.get(currentRoom)?.delete(ws)
  })
})

Component	Why Add It	Tradeoff
Redis Pub/Sub for Cross-Gateway Fan-out	At scale, hundreds of WebSocket servers handle different clients.	Redis pub/sub has no persistence — messages are lost if no subscriber is connected at publish time.
Kafka for Durable Message Queue	Postgres writes are synchronous and slow (5-50ms depending on storage).	Kafka adds operational complexity (cluster management, topic/partition configuration) and introduces eventual consistency—there's a window where a message is in Redis but not yet in Postgres.
Presence Service	Fan-out to offline users is wasteful.	Presence state is eventually consistent.

Design decision tradeoffs

WebSocket Gateway Failure

WS Gateway 1 crashes. Clients on it are disconnected — how does the system reconnect them to WS Gateway 2 and resume delivery?

Hot Partition in Redis

A popular room (1M users) receives 100K messages/second, creating a bottleneck at the Redis channel. All gateways backlog on Redis pub/sub — message delivery latency hits 5s. How do you shard rooms across Redis partitions?

Network Partition Between Gateways and Redis

WS Gateway 1 loses network access to Redis but remains connected to clients. Clients send messages but they don't fan-out. How do you detect partition, route traffic to healthy gateways, and queue unsent messages?

Two WebSocket gateways can't fan-out to each other's clients. Add Redis pub/sub between them: when WS Gateway 1 receives a message, it publishes to the room's Redis channel. Both gateways subscribe and deliver to their connected clients.

Redis pub/sub is ephemeral — if a client is offline, the message is lost. Add Kafka between the WS gateways and Postgres: WS gateways publish to Kafka, consumers persist to Postgres and handle replay for offline users.

Use sticky sessions (consistent hashing on user ID) to route reconnecting clients back to the same WS gateway — this preserves room subscriptions and reduces Redis pub/sub resubscription load.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Redis Pub/Sub for Cross-Gateway Fan-out	Redis pub/sub is a lightweight messaging layer where clients (WebSocket gateways) subscribe to named channels (one per chat room) and publish messages to them.	At scale, hundreds of WebSocket servers handle different clients.
Kafka for Durable Message Queue	Kafka acts as a buffering layer between WebSocket gateways and the persistent message store.	Postgres writes are synchronous and slow (5-50ms depending on storage).
Presence Service	A presence service maintains a real-time map of user ID to WebSocket server assignment and connection state.	Fan-out to offline users is wasteful.

Key design decisions

If the interviewer asks to scale 10×: 10x the load — architectural moves that work. Identify the single bottleneck (usually the database write path) and address it first before horizontal scaling.

10× Target20K RPSwhere your architecture must hold

What's next

Event-Driven Architecture

30 min read