Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Consistent Hashing
Understand how consistent hashing minimizes key remapping when nodes join or leave a distributed cluster.
§1Problem Statement
Learn how to minimize data movement when scaling your database cluster. Map keys to nodes using a hash ring.
§2Step-by-Step Build
Drag an API Server onto the canvas. This is the service that needs to route data to the correct shard.
The application layer that knows how to compute consistent hash keys and route requests to the correct shard in the ring. Every write and read must go to the exact node that 'owns' that key.
Without a consistent hash routing layer, your API would need to know every shard's range explicitly. Consistent hashing lets you add or remove shards without updating routing tables on every other node.
The API server holds the ring topology in memory (or fetches from a coordination service like ZooKeeper). Ring membership changes must propagate before routing is updated — brief window of stale routing is possible.
DynamoDB's internal coordinator routes requests using consistent hashing before clients ever see a response. Cassandra clients embed the ring topology directly — each driver knows which node owns a partition key without a coordinator hop.
API server with hash routing adds <0.1ms overhead per request. Ring lookup is O(log N) with a sorted array of virtual node positions.
Drag a Load Balancer to represent the consistent hash ring manager. Connect it from the API Server.
The routing layer that maps hash values to physical nodes. Maintains the ring data structure — a sorted array of virtual node positions. For each key, binary search finds the next clockwise node on the ring.
With simple modulo hashing (key % N), adding one node changes the target for nearly every key — ~80% remapping for going from 4 to 5 nodes. Consistent hashing moves only 1/N keys when a node is added — just 20% for 5 nodes.
The coordinator is a stateful component. If it loses ring membership state, requests may route to wrong nodes. Must persist ring state to durable storage and replicate coordinator for HA.
Apache Cassandra uses a coordinator pattern where each node knows the full ring. Discord's message cluster uses consistent hashing to route messages to the correct storage node with zero resharding cost when adding capacity.
Ring lookup: O(log N) with binary search over sorted virtual node array. Routing overhead: <0.5ms. Virtual nodes (vnodes) per physical node: typically 256 to ensure balanced distribution.
Drag a Database and connect it from the Load Balancer. This represents shard 1 on the hash ring.
The first physical node on the hash ring. It owns a contiguous arc of hash space — roughly 1/N of the total key space when virtual nodes are evenly distributed. All keys that hash to this arc are stored here.
Horizontal partitioning (sharding) is the only way to scale writes beyond what a single machine can handle. At Twitter scale (6,000 writes/sec), no single PostgreSQL instance can keep up — you need multiple shards.
'Hot spots' occur when some keys are accessed far more than others (celebrity accounts on Twitter). Consistent hashing distributes keys, not access patterns. You may still need separate hot-key handling (Redis cache in front).
DynamoDB splits key space across thousands of storage nodes using consistent hashing. A single DynamoDB table can scale to 40,000 read/write capacity units — impossible without partitioning. Instagram's user database is sharded across dozens of PostgreSQL nodes.
Single PostgreSQL shard: 5,000 reads/sec, 2,000 writes/sec. With 10 shards: 50k reads/sec, 20k writes/sec — linear scaling when load is evenly distributed.
Drag a Read Replica and connect it from the Database. Each shard can have replicas for fault tolerance.
A synchronous or asynchronous replica of the primary shard. In distributed systems using consistent hashing (Cassandra, DynamoDB), replicas own the same hash range as the primary and provide both fault tolerance and read scaling.
If a shard node fails without a replica, data in that arc of the ring is lost or inaccessible. Cassandra's replication factor of 3 means 3 nodes own each key — you can lose 2 before losing data.
Replication introduces write latency (waiting for N of R replicas to acknowledge). Strong consistency (all replicas) = higher latency. Eventual consistency (quorum or async) = lower latency but possible stale reads.
Cassandra with replication_factor=3 writes to 3 nodes per key. Netflix runs Cassandra at RF=3 — losing an entire availability zone doesn't impact reads. DynamoDB automatically maintains 3 replicas per shard across AZs.
RF=3: survive 2 node failures per ring segment. Quorum reads/writes (RF/2+1 = 2 of 3): balance between consistency and availability. Read throughput: 3x the primary shard's capacity when all replicas serve reads.
Drag a Cache and connect it from the API Server. Prevent hot keys from overloading specific shards.
An in-memory cache (Redis/Memcached) that sits in front of the hash ring to absorb reads for frequently accessed keys. When consistent hashing places a celebrity's account data on shard 3, all celebrity reads hit shard 3 — the cache prevents this.
Consistent hashing distributes keys evenly, not access patterns. A viral tweet's author may be on shard 7, causing 10,000 reads/sec to hit shard 7 while others sit idle. Cache intercepts these 'hot keys' before they reach the ring.
Cache must be invalidated when the underlying data changes. In a consistent hashing system where writes go directly to the ring, cache invalidation requires coordination between write path and cache layer. Common solution: cache-aside pattern with short TTL (1-30 seconds).
Twitter identified ~3,000 'hot' user accounts that drove disproportionate read traffic. They added a special cache tier just for these accounts — consistent hashing handles the long tail, cache handles the hot head. Facebook's Memcached fleet serves 1 billion requests per second to shield MySQL.
Redis: 100k+ reads/sec per instance. Cache hit rates of 95%+ reduce ring traffic to 5% of total reads. For 500k reads/sec: cache absorbs 475k, ring sees only 25k — shards stay healthy.
§310x Scaling Paths
- Introduce consistent hashing to redistribute load as you add nodes — minimize cache/shard remapping.
- Add read replicas for each primary database; route all reads there to reduce primary write contention.
- Front every service with a rate limiter to protect against abuse as traffic multiplies.
- Move session and frequently-read data into a distributed cache layer before your database becomes the bottleneck.