Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

API Gateway

beginnerAPINetworking

Networking·30 min read

API Gateway

An API Gateway is the front door to your microservices. It handles cross-cutting concerns that every service needs: authentication, rate limiting, SSL termination, request routing, and response transformation. Without it, every microservice implements auth independently.

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

APINetworking

§1Step 1 — Understand the Problem

1Understand the Problem & Establish Design Scope

Before I start designing, let me clarify the requirements. What scale are we targeting — how many requests per second?

About 10,000 requests per second at peak. The system routes traffic to three backend services: User, Product, and Order.

What's our latency budget? Is there an SLA on the gateway layer itself?

The gateway must add less than 10ms of overhead at p99. Total end-to-end p99 should stay under 200ms.

Does auth happen at the gateway or does each service handle it?

All auth should be centralized at the gateway. Services behind it trust the gateway completely — no duplicate auth logic.

Are we building multi-tenant? Do different clients get different rate limits?

Yes — free tier gets 100 req/min per client, paid tier gets 10,000 req/min. The gateway enforces this.

Back of the Envelope

→Peak traffic: 10,000 RPS → ~864M requests/day

→Auth token cache: avg token size 256 bytes × 1M active users = 256 MB in Redis

→Rate limit counters: 2 keys per client (minute + day window) × 100K clients × 8 bytes = ~1.6 MB

→Gateway latency budget: 10ms max overhead → token validation must be <2ms (Redis P99 ~0.5ms)

→Horizontal scale: single gateway node handles ~50K RPS → need 1 node at baseline, autoscale to 3 at peak

§2Step 2 — High-Level Design

2High-Level Design

Client → Load Balancer → API Gateway (Auth Check via Redis, Rate Limit Counter via Redis) → [User Service | Product Service | Order Service]

Figure 1. API Gateway sits between clients and all backend services, centralizing auth, rate limiting, and routing.

Stage 1 of 3Stage 1: Three services, no gateway — client must know every address

Step through all 3 stages to see the architecture build up.

Every client request hits the load balancer first, which distributes traffic across multiple gateway instances. Each gateway instance validates the Bearer token against Redis (sub-millisecond lookup), checks the rate limit counter, then forwards the request to the appropriate upstream service.

httpGateway API Contract

# All requests pass through the gateway on port 443

POST   /v1/users/register        → User Service
GET    /v1/users/:id             → User Service

GET    /v1/products              → Product Service
GET    /v1/products/:id          → Product Service

POST   /v1/orders                → Order Service
GET    /v1/orders/:id            → Order Service

# Headers added by gateway before forwarding:
X-User-ID: <validated-user-id>
X-Request-ID: <uuid>
X-Forwarded-For: <client-ip>

The gateway strips the Authorization header before forwarding — services receive a trusted X-User-ID header instead. This keeps JWT validation logic in one place and prevents services from accidentally trusting unvalidated tokens.

§3Step 3 — Deep Dive

3Deep Dive

Two decisions dominate gateway performance: how you validate tokens and which rate limiting algorithm you use. Both have hard latency requirements and must survive Redis being temporarily unavailable.

Algorithm	Memory per client	Allows burst?	Accuracy	Winner?
Token Bucket	~24 bytes	Yes (up to bucket size)	Exact	No — complex to implement in Redis atomically
Leaky Bucket	~16 bytes	No (fixed output rate)	Exact	No — rejects legitimate burst traffic
Fixed Window Counter	~8 bytes	2× at boundary	Approximate	No — double-spend attack at window reset
Sliding Window Counter	~16 bytes	Partial (weighted)	~99% accurate	Yes — simple INCR + EXPIRE, no boundary exploit

Rate limiting algorithm comparison — pick sliding window counter for this system.

luaSliding window counter in Redis (atomic Lua script)

-- Called on every request. Returns 1 if allowed, 0 if rate limited.
local key = KEYS[1]          -- e.g. "rl:user123:minute"
local limit = tonumber(ARGV[1])  -- e.g. 100
local window = tonumber(ARGV[2]) -- e.g. 60 (seconds)

local count = redis.call('INCR', key)
if count == 1 then
  redis.call('EXPIRE', key, window)
end

if count > limit then
  return 0  -- rate limited
end
return 1    -- allowed

Option	Latency overhead	Ops complexity	Plugin ecosystem	Best for
NGINX + Lua	<1ms	Low	Manual	Simple routing, low traffic (<1K RPS)
Kong (OSS)	1–3ms	Medium	Rich (150+ plugins)	This system — 10K RPS, plugin-driven auth
Envoy Proxy	1–2ms	High	gRPC-first	Service mesh, polyglot microservices
AWS API Gateway	5–15ms	Very low	AWS-native	Serverless backends, AWS lock-in acceptable
Custom Go proxy	<0.5ms	High	None	Latency-critical, unique requirements

Gateway implementation options — Kong wins for this scale.

The gateway is a single-path component. Run at least 3 instances behind a load balancer. If Redis is unavailable, fall back to in-memory rate limiting with a 10-second window — never reject all traffic because the cache is down.

§4Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Auth strategy	JWT validated at gateway, X-User-ID forwarded	Centralized auth, services stay stateless
Token cache	Redis with TTL matching JWT expiry	Sub-ms validation, automatic invalidation on expiry
Rate limiting	Sliding window counter in Redis Lua script	Atomic, accurate, no boundary exploits
Gateway software	Kong OSS	Plugin ecosystem covers auth, rate limiting, logging out of the box
Availability	3+ instances + Redis fallback to in-memory	No single point of failure on the critical path

Key decisions summary.

If the interviewer asks to scale 10x (100K RPS): add a second Redis cluster for rate limiting state, shard by user_id hash. Gateway instances are stateless — scale horizontally. Cache hit rate should stay above 95% since active tokens are hot.

What's next

Content Delivery Network

30 min read