Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Global Distributed System

advancedMulti-RegionReplication

Distributed Systems·70 min read

Global Distributed System

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

Multi-RegionReplication

§1Step 2 — High-Level Design

2High-Level Design

Design a multi-region system achieving 99.99% uptime. GeoDNS, cross-region replication, async event queues, and global observability.

System architecture overview

Stage 1 of 4Starting state — the problem to solve

Progressive build — add each component step by step

Add Regional Database Replicas

Deploy Postgres read replicas in EU and APAC. Each regional API reads locally from its replica — eliminating 100-200ms cross-region read latency.

What it does

Postgres streaming replication maintains real-time replicas in EU and APAC. The write-ahead log (WAL) from the US primary streams continuously to replica servers in each region. Regional APIs read from local replicas; writes are still routed to the US primary. Each regional replica applies WAL entries as they arrive, maintaining a lagging but eventually consistent copy of all data. Replicas expose a read-only SQL interface, allowing the regional API to execute SELECT queries at sub-5ms latency against local disk and memory rather than waiting for network round trips to the US.

Why it matters

US-EU network round trip is 80-120ms. US-APAC is 200-300ms. Reading from a local replica at 2-5ms is 50-100× faster than cross-region reads. At scale (10K read QPS per region), redirecting reads to local replicas eliminates cross-region congestion on the database tier and frees US primary capacity for write traffic from all regions. This separation of read and write paths is fundamental to scaling distributed databases.

Trade-off

Async replication means replicas may lag 10-100ms behind primary. Reads may return slightly stale data. For strong consistency, read from primary — but accept the 100-200ms latency cost. Most applications accept eventual consistency at this latency; critical reads (account balance, payment status) can use a read-from-primary flag or direct primary reads for a latency penalty.

Real world

GitHub uses Postgres streaming replication across 3 regions with 10-50ms lag. Shopify runs Vitess MySQL replicas in 6 regions for sub-10ms read latency. Amazon DynamoDB Global Tables achieve <1 second eventual consistency across continents using a similar pattern of asynchronous replication.

Capacity math

Streaming replication adds ~1-5% CPU overhead to primary. Each replica can handle 10K read QPS independently without performance degradation. EU and APAC replicas serve their regions without touching US primary. At 50K total read QPS split evenly across regions, 3 replicas (US primary, EU, APAC) eliminate all cross-region read traffic.

In the real world: GitHub uses Postgres streaming replication across 3 regions with 10-50ms lag. Shopify runs Vitess MySQL replicas in 6 regions for sub-10ms read latency. Amazon DynamoDB Global Tables achieve <1 second eventual consistency across continents using a similar pattern of asynchronous replication.

Add CDN for Global Edge Delivery

Add CDN nodes in front of your Anycast GSLB. Static assets (JS, CSS, images) are served from 200+ edge PoPs — API servers never see static traffic.

What it does

A Content Delivery Network (CDN) operates a global network of edge nodes (Points of Presence, or PoPs) in cities worldwide. When a user requests a static asset, the CDN's anycast DNS returns the IP of the nearest edge node. The edge node caches the asset based on Cache-Control headers from your origin. Only cache misses — typically the first request from a geographic region or after TTL expiration — traverse the network to your origin API servers. Subsequent requests from the same region or ISP are served at edge speed with microsecond latencies.

Why it matters

US-to-APAC latency is 200ms+. A CDN PoP in Singapore serves APAC users at 5-20ms. For 80% of requests (static), CDN eliminates cross-region latency entirely. At 10,000 requests/second globally, a CDN handling 8,000/sec at edge means your origin API only needs to serve 2,000/sec of dynamic traffic and cache misses — a 5× capacity reduction for origin servers.

Trade-off

CDN caching requires careful Cache-Control headers. Static assets use long TTLs (1 year with cache-busting URLs like /assets/app-abc123.js). Dynamic API responses must bypass CDN (no-cache headers) or use short TTLs (60 seconds). Purging is needed for emergency updates — most CDNs offer instant purge APIs. Users behind strict corporate proxies may bypass CDN entirely.

Real world

Cloudflare's CDN handles 57M requests/second globally. Netflix serves all video through CDN PoPs co-located with ISPs, achieving < 50ms latency for 99% of streams. A typical e-commerce site using Cloudflare sees 70%+ cache hit rate for static assets, reducing origin load by 5-10×.

Capacity math

Cloudflare has 300+ PoPs globally. A typical CDN PoP handles 100Gbps throughput. At 90% cache hit rate, origin serves only 10% of traffic — enabling 10× smaller regional API fleet. Each PoP keeps hot assets in RAM (sub-millisecond) and disk (10-100ms). Cold assets (rarely accessed) bypass cache and fetch from origin (200ms+).

In the real world: Cloudflare's CDN handles 57M requests/second globally. Netflix serves all video through CDN PoPs co-located with ISPs, achieving < 50ms latency for 99% of streams. A typical e-commerce site using Cloudflare sees 70%+ cache hit rate for static assets, reducing origin load by 5-10×.

Add a Distributed Cache Per Region

Add Redis in each region to cache hot API responses and session data locally, preventing cache misses from crossing regions.

What it does

Each region deploys a dedicated Redis cluster caching hot database query results and session tokens. When an API server in EU receives a request, it first checks the EU Redis cache for the data (e.g., user profile, product listing). On cache hit, the API returns immediately (~1ms). On miss, the API queries the local database replica. After the query completes, the API stores the result in Redis with a TTL (typically 5-60 minutes). Subsequent requests from any EU client hit Redis within 1ms.

Why it matters

At peak traffic, hot rows (trending content, popular user profiles) are read millions of times per second. Caching in Redis keeps per-region database replica load manageable. A user profile queried 10K times per second in EU would stress the replica CPU; caching eliminates 9,500 of those replica queries. Regional caches are isolated — EU Redis only caches data relevant to EU requests, not global state.

Trade-off

Regional caches are not shared — if a user logs in from EU then APAC, their session may not be in APAC's cache. Use a global session store (DynamoDB Global Tables, Redis Cluster across regions) for session tokens. Invalidation is also tricky: if a product price changes, all 3 regional caches must purge it. Use pub/sub (Kafka, Redis Streams) to broadcast invalidation events to all regions.

Real world

AWS ElastiCache Global Datastore replicates Redis across regions with < 1s lag. Cloudflare Workers KV provides a globally replicated key-value store with < 100ms eventual consistency. Twitter uses regional Memcached clusters in each datacenter for timeline caching.

Capacity math

A 3-region deployment with 64GB Redis each handles 3M concurrent sessions at ~20KB each. Regional cache hit rates of 85%+ reduce database replica load to < 15% of total read traffic. At peak, EU Redis handles 50K read QPS at 1ms latency each without CPU saturation. A single 64GB Redis node processes 100K ops/sec.

In the real world: AWS ElastiCache Global Datastore replicates Redis across regions with < 1s lag. Cloudflare Workers KV provides a globally replicated key-value store with < 100ms eventual consistency. Twitter uses regional Memcached clusters in each datacenter for timeline caching.

US-EU Network Partition: A network partition splits US and EU regions. The primary database is in US, replicas in EU/APAC cannot sync. What happens to writes from EU users? Can EU region become write-capable with multi-master replication, or do writes queue until partition heals?

§2Step 3 — Deep Dive

3Deep Dive

Strategy	Write latency	Consistency	Conflict risk	Best for	Cost	Ops burden
Single-region (no replication)	< 5ms writes	Strong	None	Simple apps, one geography	Low	Low
Read replicas (async)	< 5ms writes	Eventual	None (read-only)	Read-heavy global apps ✓	Medium	Low
Multi-master (async)	< 5ms writes	Eventual	Yes (last-write-wins)	Write-heavy, availability first	High	High
Multi-master (sync)	100-300ms writes	Strong	None (serialized)	Finance, critical writes	High	High
CRDTs (conflict-free)	< 5ms writes	Eventual	Auto-resolved	Collaborative editing, counters	Medium	High

Multi-region replication strategies — async replication + eventual consistency is the pragmatic default.

typescriptGlobal Distributed — region-aware read routing with replica fallback

import { Pool } from 'pg'

type Region = 'us' | 'eu' | 'ap'

const primaryPool = new Pool({ host: process.env.DB_PRIMARY_HOST })
const replicaPools: Record<Region, Pool> = {
  us: new Pool({ host: process.env.DB_REPLICA_US_HOST }),
  eu: new Pool({ host: process.env.DB_REPLICA_EU_HOST }),
  ap: new Pool({ host: process.env.DB_REPLICA_AP_HOST }),
}

function detectRegion(req: Request): Region {
  // Cloud providers set this header (AWS: CloudFront-Viewer-Country)
  const country = req.headers.get('CloudFront-Viewer-Country') ?? 'US'
  if (['DE', 'FR', 'GB', 'NL'].includes(country)) return 'eu'
  if (['JP', 'AU', 'SG', 'IN'].includes(country)) return 'ap'
  return 'us'
}

export async function query(
  req: Request,
  sql: string,
  params: unknown[],
  options: { write?: boolean; strongRead?: boolean } = {}
): Promise<unknown[]> {
  if (options.write || options.strongRead) {
    // Writes and strong reads always go to primary
    const { rows } = await primaryPool.query(sql, params)
    return rows
  }

  // Route reads to nearest regional replica
  const region = detectRegion(req)
  try {
    const { rows } = await replicaPools[region].query(sql, params)
    return rows
  } catch {
    // Replica failure — fall back to primary
    const { rows } = await primaryPool.query(sql, params)
    return rows
  }
}

Component	Why Add It	Tradeoff
Regional Database Replicas	US-EU network round trip is 80-120ms.	Async replication means replicas may lag 10-100ms behind primary.
CDN for Global Edge Delivery	US-to-APAC latency is 200ms+.	CDN caching requires careful Cache-Control headers.
Distributed Cache Per Region	At peak traffic, hot rows (trending content, popular user profiles) are read millions of times per second.	Regional caches are not shared — if a user logs in from EU then APAC, their session may not be in APAC's cache.

Design decision tradeoffs

US-EU Network Partition

A network partition splits US and EU regions. The primary database is in US, replicas in EU/APAC cannot sync. What happens to writes from EU users? Can EU region become write-capable with multi-master replication, or do writes queue until partition heals?

Hot Partition — Trending Content

A single database row (trending video) gets hammered with reads from all 3 regions. Regional caches can't keep up; replica CPU spikes to 95%. Does the system degrade gracefully or cascade to a wider outage?

Fan-Out Overload — Global Update

A single write (system config update) must fan out to all 3 regions' replicas within 10 seconds. At 50K write/sec, can replication stream keep up, or do replicas fall behind and start serving stale config?

EU and APAC regions reading from US primary adds 100-200ms cross-region latency. Add Postgres read replicas in EU and APAC — each region reads locally (< 5ms) and replication handles sync.

Writes from EU/APAC still go to US primary — adding 100ms. For write-heavy workloads, use multi-master replication (CockroachDB, YugabyteDB) with conflict resolution via last-write-wins or CRDTs.

Anycast routing sends users to the nearest region automatically. Add a CDN layer in front of the Anycast GSLB for static content — CDN edge serves 90%+ of requests before they reach your application.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Regional Database Replicas	Postgres streaming replication maintains real-time replicas in EU and APAC.	US-EU network round trip is 80-120ms.
CDN for Global Edge Delivery	A Content Delivery Network (CDN) operates a global network of edge nodes (Points of Presence, or PoPs) in cities worldwide.	US-to-APAC latency is 200ms+.
Distributed Cache Per Region	Each region deploys a dedicated Redis cluster caching hot database query results and session tokens.	At peak traffic, hot rows (trending content, popular user profiles) are read millions of times per second.

Key design decisions

If the interviewer asks to scale 10×: 10x the load — architectural moves that work. Identify the single bottleneck (usually the database write path) and address it first before horizontal scaling.

10× Target100K RPSwhere your architecture must hold