Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Global Distributed System
§1Step 2 — High-Level Design
Design a multi-region system achieving 99.99% uptime. GeoDNS, cross-region replication, async event queues, and global observability.
Deploy Postgres read replicas in EU and APAC. Each regional API reads locally from its replica — eliminating 100-200ms cross-region read latency.
Postgres streaming replication maintains real-time replicas in EU and APAC. The write-ahead log (WAL) from the US primary streams continuously to replica servers in each region. Regional APIs read from local replicas; writes are still routed to the US primary. Each regional replica applies WAL entries as they arrive, maintaining a lagging but eventually consistent copy of all data. Replicas expose a read-only SQL interface, allowing the regional API to execute SELECT queries at sub-5ms latency against local disk and memory rather than waiting for network round trips to the US.
US-EU network round trip is 80-120ms. US-APAC is 200-300ms. Reading from a local replica at 2-5ms is 50-100× faster than cross-region reads. At scale (10K read QPS per region), redirecting reads to local replicas eliminates cross-region congestion on the database tier and frees US primary capacity for write traffic from all regions. This separation of read and write paths is fundamental to scaling distributed databases.
Async replication means replicas may lag 10-100ms behind primary. Reads may return slightly stale data. For strong consistency, read from primary — but accept the 100-200ms latency cost. Most applications accept eventual consistency at this latency; critical reads (account balance, payment status) can use a read-from-primary flag or direct primary reads for a latency penalty.
GitHub uses Postgres streaming replication across 3 regions with 10-50ms lag. Shopify runs Vitess MySQL replicas in 6 regions for sub-10ms read latency. Amazon DynamoDB Global Tables achieve <1 second eventual consistency across continents using a similar pattern of asynchronous replication.
Streaming replication adds ~1-5% CPU overhead to primary. Each replica can handle 10K read QPS independently without performance degradation. EU and APAC replicas serve their regions without touching US primary. At 50K total read QPS split evenly across regions, 3 replicas (US primary, EU, APAC) eliminate all cross-region read traffic.
Add CDN nodes in front of your Anycast GSLB. Static assets (JS, CSS, images) are served from 200+ edge PoPs — API servers never see static traffic.
A Content Delivery Network (CDN) operates a global network of edge nodes (Points of Presence, or PoPs) in cities worldwide. When a user requests a static asset, the CDN's anycast DNS returns the IP of the nearest edge node. The edge node caches the asset based on Cache-Control headers from your origin. Only cache misses — typically the first request from a geographic region or after TTL expiration — traverse the network to your origin API servers. Subsequent requests from the same region or ISP are served at edge speed with microsecond latencies.
US-to-APAC latency is 200ms+. A CDN PoP in Singapore serves APAC users at 5-20ms. For 80% of requests (static), CDN eliminates cross-region latency entirely. At 10,000 requests/second globally, a CDN handling 8,000/sec at edge means your origin API only needs to serve 2,000/sec of dynamic traffic and cache misses — a 5× capacity reduction for origin servers.
CDN caching requires careful Cache-Control headers. Static assets use long TTLs (1 year with cache-busting URLs like /assets/app-abc123.js). Dynamic API responses must bypass CDN (no-cache headers) or use short TTLs (60 seconds). Purging is needed for emergency updates — most CDNs offer instant purge APIs. Users behind strict corporate proxies may bypass CDN entirely.
Cloudflare's CDN handles 57M requests/second globally. Netflix serves all video through CDN PoPs co-located with ISPs, achieving < 50ms latency for 99% of streams. A typical e-commerce site using Cloudflare sees 70%+ cache hit rate for static assets, reducing origin load by 5-10×.
Cloudflare has 300+ PoPs globally. A typical CDN PoP handles 100Gbps throughput. At 90% cache hit rate, origin serves only 10% of traffic — enabling 10× smaller regional API fleet. Each PoP keeps hot assets in RAM (sub-millisecond) and disk (10-100ms). Cold assets (rarely accessed) bypass cache and fetch from origin (200ms+).
Add Redis in each region to cache hot API responses and session data locally, preventing cache misses from crossing regions.
Each region deploys a dedicated Redis cluster caching hot database query results and session tokens. When an API server in EU receives a request, it first checks the EU Redis cache for the data (e.g., user profile, product listing). On cache hit, the API returns immediately (~1ms). On miss, the API queries the local database replica. After the query completes, the API stores the result in Redis with a TTL (typically 5-60 minutes). Subsequent requests from any EU client hit Redis within 1ms.
At peak traffic, hot rows (trending content, popular user profiles) are read millions of times per second. Caching in Redis keeps per-region database replica load manageable. A user profile queried 10K times per second in EU would stress the replica CPU; caching eliminates 9,500 of those replica queries. Regional caches are isolated — EU Redis only caches data relevant to EU requests, not global state.
Regional caches are not shared — if a user logs in from EU then APAC, their session may not be in APAC's cache. Use a global session store (DynamoDB Global Tables, Redis Cluster across regions) for session tokens. Invalidation is also tricky: if a product price changes, all 3 regional caches must purge it. Use pub/sub (Kafka, Redis Streams) to broadcast invalidation events to all regions.
AWS ElastiCache Global Datastore replicates Redis across regions with < 1s lag. Cloudflare Workers KV provides a globally replicated key-value store with < 100ms eventual consistency. Twitter uses regional Memcached clusters in each datacenter for timeline caching.
A 3-region deployment with 64GB Redis each handles 3M concurrent sessions at ~20KB each. Regional cache hit rates of 85%+ reduce database replica load to < 15% of total read traffic. At peak, EU Redis handles 50K read QPS at 1ms latency each without CPU saturation. A single 64GB Redis node processes 100K ops/sec.
§2Step 3 — Deep Dive
Postgres streaming replication maintains real-time replicas in EU and APAC. The write-ahead log (WAL) from the US primary streams continuously to replica servers in each region. Regional APIs read from local replicas; writes are still routed to the US primary. Each regional replica applies WAL entries as they arrive, maintaining a lagging but eventually consistent copy of all data. Replicas expose a read-only SQL interface, allowing the regional API to execute SELECT queries at sub-5ms latency against local disk and memory rather than waiting for network round trips to the US.
| Strategy | Write latency | Consistency | Conflict risk | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| Single-region (no replication) | < 5ms writes | Strong | None | Simple apps, one geography | Low | Low |
| Read replicas (async) | < 5ms writes | Eventual | None (read-only) | Read-heavy global apps ✓ | Medium | Low |
| Multi-master (async) | < 5ms writes | Eventual | Yes (last-write-wins) | Write-heavy, availability first | High | High |
| Multi-master (sync) | 100-300ms writes | Strong | None (serialized) | Finance, critical writes | High | High |
| CRDTs (conflict-free) | < 5ms writes | Eventual | Auto-resolved | Collaborative editing, counters | Medium | High |
Multi-region replication strategies — async replication + eventual consistency is the pragmatic default.
import { Pool } from 'pg'
type Region = 'us' | 'eu' | 'ap'
const primaryPool = new Pool({ host: process.env.DB_PRIMARY_HOST })
const replicaPools: Record<Region, Pool> = {
us: new Pool({ host: process.env.DB_REPLICA_US_HOST }),
eu: new Pool({ host: process.env.DB_REPLICA_EU_HOST }),
ap: new Pool({ host: process.env.DB_REPLICA_AP_HOST }),
}
function detectRegion(req: Request): Region {
// Cloud providers set this header (AWS: CloudFront-Viewer-Country)
const country = req.headers.get('CloudFront-Viewer-Country') ?? 'US'
if (['DE', 'FR', 'GB', 'NL'].includes(country)) return 'eu'
if (['JP', 'AU', 'SG', 'IN'].includes(country)) return 'ap'
return 'us'
}
export async function query(
req: Request,
sql: string,
params: unknown[],
options: { write?: boolean; strongRead?: boolean } = {}
): Promise<unknown[]> {
if (options.write || options.strongRead) {
// Writes and strong reads always go to primary
const { rows } = await primaryPool.query(sql, params)
return rows
}
// Route reads to nearest regional replica
const region = detectRegion(req)
try {
const { rows } = await replicaPools[region].query(sql, params)
return rows
} catch {
// Replica failure — fall back to primary
const { rows } = await primaryPool.query(sql, params)
return rows
}
}| Component | Why Add It | Tradeoff |
|---|---|---|
| Regional Database Replicas | US-EU network round trip is 80-120ms. | Async replication means replicas may lag 10-100ms behind primary. |
| CDN for Global Edge Delivery | US-to-APAC latency is 200ms+. | CDN caching requires careful Cache-Control headers. |
| Distributed Cache Per Region | At peak traffic, hot rows (trending content, popular user profiles) are read millions of times per second. | Regional caches are not shared — if a user logs in from EU then APAC, their session may not be in APAC's cache. |
Design decision tradeoffs
A network partition splits US and EU regions. The primary database is in US, replicas in EU/APAC cannot sync. What happens to writes from EU users? Can EU region become write-capable with multi-master replication, or do writes queue until partition heals?
A single database row (trending video) gets hammered with reads from all 3 regions. Regional caches can't keep up; replica CPU spikes to 95%. Does the system degrade gracefully or cascade to a wider outage?
A single write (system config update) must fan out to all 3 regions' replicas within 10 seconds. At 50K write/sec, can replication stream keep up, or do replicas fall behind and start serving stale config?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Regional Database Replicas | Postgres streaming replication maintains real-time replicas in EU and APAC. | US-EU network round trip is 80-120ms. |
| CDN for Global Edge Delivery | A Content Delivery Network (CDN) operates a global network of edge nodes (Points of Presence, or PoPs) in cities worldwide. | US-to-APAC latency is 200ms+. |
| Distributed Cache Per Region | Each region deploys a dedicated Redis cluster caching hot database query results and session tokens. | At peak traffic, hot rows (trending content, popular user profiles) are read millions of times per second. |
Key design decisions