Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Rate Limiter
§1Step 2 — High-Level Design
Design a distributed rate limiter using token bucket and sliding window algorithms. Protect APIs from abuse at scale.
Connect a Redis cache to the API Gateway to store per-client request counters with atomic increment and expiry.
A Redis cache stores per-client sliding window counters. Each request atomically increments the counter and checks it against the configured limit.
In-memory Redis handles 1M+ ops/second at sub-millisecond latency. Storing counters in Redis adds < 1ms overhead per request — negligible compared to the actual API call.
Fixed window counters allow bursts at window boundaries (a client can send 2× the limit). Sliding window via sorted sets is more accurate but uses O(n) memory per client.
Stripe, GitHub, and Twilio all use Redis-backed rate limiting. Cloudflare uses a distributed sliding window counter across edge nodes.
A single Redis node handles 1M+ counter increments/second. At 1000 clients each making 1000 req/min, that's ~17K ops/sec — trivial for Redis.
At high traffic, read the current counter from a Redis replica to reduce load on the primary.
A Redis replica receives a real-time copy of all writes and serves read traffic — checking if a client is within limits.
At high traffic, 80% of rate-limit operations are reads (checking count). Replicas handle reads without touching the write-path primary.
Replication lag (typically < 1ms for Redis) means a replica may briefly allow a request that the primary would reject. Acceptable for most rate-limiting use cases.
Lyft's Ratelimit service uses Redis replicas. Envoy proxy uses Redis cluster with replicas for distributed rate limiting.
Each Redis replica adds ~1M read ops/second capacity. Two replicas sustain 2M rate-limit checks per second.
Protect against Redis unavailability — if the rate-limit store is unreachable, fail open (allow requests) rather than blocking all traffic.
A circuit breaker monitors Redis health. If Redis is unavailable, it trips open and the gateway bypasses rate limiting rather than returning 500s.
Rate limiting is a soft guarantee — it's better to allow extra requests temporarily than to make your entire API unavailable because Redis is down.
Fail-open means abusive clients get through during Redis downtime. Fail-closed (reject all requests) provides stronger protection but harms legitimate traffic.
Netflix's Hystrix circuit breaker uses fail-open for non-critical paths. AWS API Gateway falls back to default limits when DynamoDB rate-limit tables are unavailable.
Circuit breakers add microsecond overhead. The 50ms timeout threshold and 5-second recovery window are typical production defaults.
§2Step 3 — Deep Dive
A Redis cache stores per-client sliding window counters. Each request atomically increments the counter and checks it against the configured limit.
| Algorithm | Memory | Burst handling | Accuracy | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| Fixed window counter | O(1) | Poor (edge spikes) | Medium | Simple, coarse limits | Low | Low |
| Sliding window log | O(requests) | Excellent | High | Strict per-user enforcement | Low | Low |
| Sliding window counter | O(1) | Good | High | General purpose APIs ✓ | Low | Low |
| Token bucket | O(1) | Excellent | High | Bursty traffic, network | Low | Low |
| Leaky bucket | O(queue) | None (queued) | High | Smooth output rate control | Low | Low |
Rate limiting algorithms — sliding window wins for most APIs.
import { createClient } from 'redis'
const redis = createClient()
export async function checkRateLimit(
userId: string,
limitPerMinute: number
): Promise<{ allowed: boolean; remaining: number }> {
const now = Date.now()
const windowStart = now - 60_000 // 1-minute window
const key = `rl:${userId}`
const pipeline = redis.multi()
// Remove entries outside the window
pipeline.zRemRangeByScore(key, 0, windowStart)
// Count current requests in window
pipeline.zCard(key)
// Add current request with timestamp as score
pipeline.zAdd(key, { score: now, value: `${now}` })
// Expire key after 2 minutes
pipeline.expire(key, 120)
const results = await pipeline.exec()
const count = results[1] as number
if (count >= limitPerMinute) {
return { allowed: false, remaining: 0 }
}
return { allowed: true, remaining: limitPerMinute - count - 1 }
}| Component | Why Add It | Tradeoff |
|---|---|---|
| Redis for Rate Counter Storage | In-memory Redis handles 1M+ ops/second at sub-millisecond latency. | Fixed window counters allow bursts at window boundaries (a client can send 2× the limit). |
| Redis Replica for Read Scaling | At high traffic, 80% of rate-limit operations are reads (checking count). | Replication lag (typically < 1ms for Redis) means a replica may briefly allow a request that the primary would reject. |
| Circuit Breaker at the Gateway | Rate limiting is a soft guarantee — it's better to allow extra requests temporarily than to make your entire API unavailable because Redis is down. | Fail-open means abusive clients get through during Redis downtime. |
Design decision tradeoffs
The API Gateway goes down. Requests bypass rate limiting entirely — how does this affect downstream?
Redis becomes slow (100ms+ response times). The rate limiter blocks legitimate requests while waiting for responses. How do you handle the cascading timeouts?
Attacker floods the gateway with 100K req/sec from diverse IPs. The gateway and Redis struggle to keep up. Does your rate limiter protect itself from being the bottleneck?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Redis for Rate Counter Storage | A Redis cache stores per-client sliding window counters. | In-memory Redis handles 1M+ ops/second at sub-millisecond latency. |
| Redis Replica for Read Scaling | A Redis replica receives a real-time copy of all writes and serves read traffic — checking if a client is within limits. | At high traffic, 80% of rate-limit operations are reads (checking count). |
| Circuit Breaker at the Gateway | A circuit breaker monitors Redis health. | Rate limiting is a soft guarantee — it's better to allow extra requests temporarily than to make your entire API unavailable because Redis is down. |
Key design decisions