Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Consistent Hashing

beginnerAlgorithmsDistribution

Fundamentals·25 min read

Consistent Hashing

Understand how consistent hashing minimizes key remapping when nodes join or leave a distributed cluster.

AlgorithmsDistribution

§1Problem Statement

Learn how to minimize data movement when scaling your database cluster. Map keys to nodes using a hash ring.

Add an API Server

§2Step-by-Step Build

Add an API Server

Drag an API Server onto the canvas. This is the service that needs to route data to the correct shard.

What it does

The application layer that knows how to compute consistent hash keys and route requests to the correct shard in the ring. Every write and read must go to the exact node that 'owns' that key.

Why it matters

Without a consistent hash routing layer, your API would need to know every shard's range explicitly. Consistent hashing lets you add or remove shards without updating routing tables on every other node.

Trade-off

The API server holds the ring topology in memory (or fetches from a coordination service like ZooKeeper). Ring membership changes must propagate before routing is updated — brief window of stale routing is possible.

Real world

DynamoDB's internal coordinator routes requests using consistent hashing before clients ever see a response. Cassandra clients embed the ring topology directly — each driver knows which node owns a partition key without a coordinator hop.

Capacity math

API server with hash routing adds <0.1ms overhead per request. Ring lookup is O(log N) with a sorted array of virtual node positions.

Add the Hash Ring Coordinator

Drag a Load Balancer to represent the consistent hash ring manager. Connect it from the API Server.

What it does

The routing layer that maps hash values to physical nodes. Maintains the ring data structure — a sorted array of virtual node positions. For each key, binary search finds the next clockwise node on the ring.

Why it matters

With simple modulo hashing (key % N), adding one node changes the target for nearly every key — ~80% remapping for going from 4 to 5 nodes. Consistent hashing moves only 1/N keys when a node is added — just 20% for 5 nodes.

Trade-off

The coordinator is a stateful component. If it loses ring membership state, requests may route to wrong nodes. Must persist ring state to durable storage and replicate coordinator for HA.

Real world

Apache Cassandra uses a coordinator pattern where each node knows the full ring. Discord's message cluster uses consistent hashing to route messages to the correct storage node with zero resharding cost when adding capacity.

Capacity math

Ring lookup: O(log N) with binary search over sorted virtual node array. Routing overhead: <0.5ms. Virtual nodes (vnodes) per physical node: typically 256 to ensure balanced distribution.

Add the Primary Database Shard

Drag a Database and connect it from the Load Balancer. This represents shard 1 on the hash ring.

What it does

The first physical node on the hash ring. It owns a contiguous arc of hash space — roughly 1/N of the total key space when virtual nodes are evenly distributed. All keys that hash to this arc are stored here.

Why it matters

Horizontal partitioning (sharding) is the only way to scale writes beyond what a single machine can handle. At Twitter scale (6,000 writes/sec), no single PostgreSQL instance can keep up — you need multiple shards.

Trade-off

'Hot spots' occur when some keys are accessed far more than others (celebrity accounts on Twitter). Consistent hashing distributes keys, not access patterns. You may still need separate hot-key handling (Redis cache in front).

Real world

DynamoDB splits key space across thousands of storage nodes using consistent hashing. A single DynamoDB table can scale to 40,000 read/write capacity units — impossible without partitioning. Instagram's user database is sharded across dozens of PostgreSQL nodes.

Capacity math

Single PostgreSQL shard: 5,000 reads/sec, 2,000 writes/sec. With 10 shards: 50k reads/sec, 20k writes/sec — linear scaling when load is evenly distributed.

Add a Read Replica for the Shard

Drag a Read Replica and connect it from the Database. Each shard can have replicas for fault tolerance.

What it does

A synchronous or asynchronous replica of the primary shard. In distributed systems using consistent hashing (Cassandra, DynamoDB), replicas own the same hash range as the primary and provide both fault tolerance and read scaling.

Why it matters

If a shard node fails without a replica, data in that arc of the ring is lost or inaccessible. Cassandra's replication factor of 3 means 3 nodes own each key — you can lose 2 before losing data.

Trade-off

Replication introduces write latency (waiting for N of R replicas to acknowledge). Strong consistency (all replicas) = higher latency. Eventual consistency (quorum or async) = lower latency but possible stale reads.

Real world

Cassandra with replication_factor=3 writes to 3 nodes per key. Netflix runs Cassandra at RF=3 — losing an entire availability zone doesn't impact reads. DynamoDB automatically maintains 3 replicas per shard across AZs.

Capacity math

RF=3: survive 2 node failures per ring segment. Quorum reads/writes (RF/2+1 = 2 of 3): balance between consistency and availability. Read throughput: 3x the primary shard's capacity when all replicas serve reads.

Add a Cache for Hot Keys

Drag a Cache and connect it from the API Server. Prevent hot keys from overloading specific shards.

What it does

An in-memory cache (Redis/Memcached) that sits in front of the hash ring to absorb reads for frequently accessed keys. When consistent hashing places a celebrity's account data on shard 3, all celebrity reads hit shard 3 — the cache prevents this.

Why it matters

Consistent hashing distributes keys evenly, not access patterns. A viral tweet's author may be on shard 7, causing 10,000 reads/sec to hit shard 7 while others sit idle. Cache intercepts these 'hot keys' before they reach the ring.

Trade-off

Cache must be invalidated when the underlying data changes. In a consistent hashing system where writes go directly to the ring, cache invalidation requires coordination between write path and cache layer. Common solution: cache-aside pattern with short TTL (1-30 seconds).

Real world

Twitter identified ~3,000 'hot' user accounts that drove disproportionate read traffic. They added a special cache tier just for these accounts — consistent hashing handles the long tail, cache handles the hot head. Facebook's Memcached fleet serves 1 billion requests per second to shield MySQL.

Capacity math

Redis: 100k+ reads/sec per instance. Cache hit rates of 95%+ reduce ring traffic to 5% of total reads. For 500k reads/sec: cache absorbs 475k, ring sees only 25k — shards stay healthy.

§310x Scaling Paths

From prototype to planet-scale

Introduce consistent hashing to redistribute load as you add nodes — minimize cache/shard remapping.
Add read replicas for each primary database; route all reads there to reduce primary write contention.
Front every service with a rate limiter to protect against abuse as traffic multiplies.
Move session and frequently-read data into a distributed cache layer before your database becomes the bottleneck.

What's next