Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Rate Limiter

beginnerAPIAlgorithms

Fundamentals·30 min read

Rate Limiter

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

APIAlgorithms

§1Step 2 — High-Level Design

2High-Level Design

Design a distributed rate limiter using token bucket and sliding window algorithms. Protect APIs from abuse at scale.

System architecture overview

Stage 1 of 4Starting state — the problem to solve

Progressive build — add each component step by step

Add Redis for Rate Counter Storage

Connect a Redis cache to the API Gateway to store per-client request counters with atomic increment and expiry.

What it does

A Redis cache stores per-client sliding window counters. Each request atomically increments the counter and checks it against the configured limit.

Why it matters

In-memory Redis handles 1M+ ops/second at sub-millisecond latency. Storing counters in Redis adds < 1ms overhead per request — negligible compared to the actual API call.

Trade-off

Fixed window counters allow bursts at window boundaries (a client can send 2× the limit). Sliding window via sorted sets is more accurate but uses O(n) memory per client.

Real world

Stripe, GitHub, and Twilio all use Redis-backed rate limiting. Cloudflare uses a distributed sliding window counter across edge nodes.

Capacity math

A single Redis node handles 1M+ counter increments/second. At 1000 clients each making 1000 req/min, that's ~17K ops/sec — trivial for Redis.

In the real world: Stripe, GitHub, and Twilio all use Redis-backed rate limiting. Cloudflare uses a distributed sliding window counter across edge nodes.

Add a Redis Replica for Read Scaling

At high traffic, read the current counter from a Redis replica to reduce load on the primary.

What it does

A Redis replica receives a real-time copy of all writes and serves read traffic — checking if a client is within limits.

Why it matters

At high traffic, 80% of rate-limit operations are reads (checking count). Replicas handle reads without touching the write-path primary.

Trade-off

Replication lag (typically < 1ms for Redis) means a replica may briefly allow a request that the primary would reject. Acceptable for most rate-limiting use cases.

Real world

Lyft's Ratelimit service uses Redis replicas. Envoy proxy uses Redis cluster with replicas for distributed rate limiting.

Capacity math

Each Redis replica adds ~1M read ops/second capacity. Two replicas sustain 2M rate-limit checks per second.

In the real world: Lyft's Ratelimit service uses Redis replicas. Envoy proxy uses Redis cluster with replicas for distributed rate limiting.

Add a Circuit Breaker at the Gateway

Protect against Redis unavailability — if the rate-limit store is unreachable, fail open (allow requests) rather than blocking all traffic.

What it does

A circuit breaker monitors Redis health. If Redis is unavailable, it trips open and the gateway bypasses rate limiting rather than returning 500s.

Why it matters

Rate limiting is a soft guarantee — it's better to allow extra requests temporarily than to make your entire API unavailable because Redis is down.

Trade-off

Fail-open means abusive clients get through during Redis downtime. Fail-closed (reject all requests) provides stronger protection but harms legitimate traffic.

Real world

Netflix's Hystrix circuit breaker uses fail-open for non-critical paths. AWS API Gateway falls back to default limits when DynamoDB rate-limit tables are unavailable.

Capacity math

Circuit breakers add microsecond overhead. The 50ms timeout threshold and 5-second recovery window are typical production defaults.

In the real world: Netflix's Hystrix circuit breaker uses fail-open for non-critical paths. AWS API Gateway falls back to default limits when DynamoDB rate-limit tables are unavailable.

Gateway Failure: The API Gateway goes down. Requests bypass rate limiting entirely — how does this affect downstream?

§2Step 3 — Deep Dive

3Deep Dive

A Redis cache stores per-client sliding window counters. Each request atomically increments the counter and checks it against the configured limit.

Algorithm	Memory	Burst handling	Accuracy	Best for	Cost	Ops burden
Fixed window counter	O(1)	Poor (edge spikes)	Medium	Simple, coarse limits	Low	Low
Sliding window log	O(requests)	Excellent	High	Strict per-user enforcement	Low	Low
Sliding window counter	O(1)	Good	High	General purpose APIs ✓	Low	Low
Token bucket	O(1)	Excellent	High	Bursty traffic, network	Low	Low
Leaky bucket	O(queue)	None (queued)	High	Smooth output rate control	Low	Low

Rate limiting algorithms — sliding window wins for most APIs.

typescriptRate Limiter — sliding window counter with Redis

import { createClient } from 'redis'

const redis = createClient()

export async function checkRateLimit(
  userId: string,
  limitPerMinute: number
): Promise<{ allowed: boolean; remaining: number }> {
  const now = Date.now()
  const windowStart = now - 60_000  // 1-minute window
  const key = `rl:${userId}`

  const pipeline = redis.multi()
  // Remove entries outside the window
  pipeline.zRemRangeByScore(key, 0, windowStart)
  // Count current requests in window
  pipeline.zCard(key)
  // Add current request with timestamp as score
  pipeline.zAdd(key, { score: now, value: `${now}` })
  // Expire key after 2 minutes
  pipeline.expire(key, 120)

  const results = await pipeline.exec()
  const count = results[1] as number

  if (count >= limitPerMinute) {
    return { allowed: false, remaining: 0 }
  }
  return { allowed: true, remaining: limitPerMinute - count - 1 }
}

Component	Why Add It	Tradeoff
Redis for Rate Counter Storage	In-memory Redis handles 1M+ ops/second at sub-millisecond latency.	Fixed window counters allow bursts at window boundaries (a client can send 2× the limit).
Redis Replica for Read Scaling	At high traffic, 80% of rate-limit operations are reads (checking count).	Replication lag (typically < 1ms for Redis) means a replica may briefly allow a request that the primary would reject.
Circuit Breaker at the Gateway	Rate limiting is a soft guarantee — it's better to allow extra requests temporarily than to make your entire API unavailable because Redis is down.	Fail-open means abusive clients get through during Redis downtime.

Design decision tradeoffs

Gateway Failure

The API Gateway goes down. Requests bypass rate limiting entirely — how does this affect downstream?

Redis Latency Spike

Redis becomes slow (100ms+ response times). The rate limiter blocks legitimate requests while waiting for responses. How do you handle the cascading timeouts?

Traffic Surge Attack

Attacker floods the gateway with 100K req/sec from diverse IPs. The gateway and Redis struggle to keep up. Does your rate limiter protect itself from being the bottleneck?

The API Gateway handles all inbound requests. Add a Redis cache connected to the gateway — it will store sliding window counters keyed by client IP or API key.

Redis INCR + EXPIRE implements a sliding window counter atomically. For each request: INCR the key, set TTL if new, check count against limit, return 429 if exceeded.

For high accuracy, use Redis sorted sets (ZADD) to store timestamps — this gives true sliding windows instead of fixed windows that can allow 2× the limit at boundaries.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Redis for Rate Counter Storage	A Redis cache stores per-client sliding window counters.	In-memory Redis handles 1M+ ops/second at sub-millisecond latency.
Redis Replica for Read Scaling	A Redis replica receives a real-time copy of all writes and serves read traffic — checking if a client is within limits.	At high traffic, 80% of rate-limit operations are reads (checking count).
Circuit Breaker at the Gateway	A circuit breaker monitors Redis health.	Rate limiting is a soft guarantee — it's better to allow extra requests temporarily than to make your entire API unavailable because Redis is down.

Key design decisions

If the interviewer asks to scale 10×: From prototype to planet-scale. Introduce consistent hashing to redistribute load as you add nodes — minimize cache/shard remapping.

10× Target15K RPSwhere your architecture must hold

What's next