Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

URL Shortener

beginnerCachingDatabase

Fundamentals·45 min read

URL Shortener

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

CachingDatabase

§1Step 2 — High-Level Design

2High-Level Design

Build a URL shortening service from a single server to a globally distributed system handling billions of redirects.

System architecture overview

Stage 1 of 4Starting state — the problem to solve

Progressive build — add each component step by step

Add a Redis Cache

Place Redis between API servers and database to cache the mapping of short codes to original URLs.

What it does

Redis is an in-memory cache that stores key-value pairs for nanosecond lookup times. For URL shorteners, it caches the mapping of short codes (e.g., 'abc123') to full URLs.

Why it matters

URL expansion is a hot-path read. At 1000 RPS, 80% of traffic hits 20% of URLs. Caching these hot URLs saves 10-50ms database round trips. A single Redis node handles 1M reads/sec at < 1ms latency.

Trade-off

Cache invalidation is necessary if a URL is modified or deleted. Use TTL-based expiry (24-30 days typical) or invalidate on explicit delete.

Real world

Bit.ly caches 95% of URL expansions in Redis. TinyURL caches top 10k URLs. Twitter's t.co uses multi-level caching to serve 150k redirects/sec from memory.

Capacity math

A 16GB Redis node caches ~4M average-sized URL mappings. At 80/20 distribution, this covers the hot set for services up to 10M total URLs.

In the real world: Bit.ly caches 95% of URL expansions in Redis. TinyURL caches top 10k URLs. Twitter's t.co uses multi-level caching to serve 150k redirects/sec from memory.

Add a Read Replica

Distribute database read load across a primary and replica. Writes go to primary; reads go to replica.

What it does

A read replica is a database copy that receives updates from the primary and serves read-only queries. At 100:1 read-to-write ratio, splitting reads across replicas linearly increases throughput.

Why it matters

A single Postgres instance tops out at ~3000 queries/sec depending on hardware. With 1000 RPS and 100:1 read-write split, the database alone becomes the bottleneck. Replicas multiply read capacity.

Trade-off

Replication lag (typically < 10ms) means replicas serve slightly stale data. For URL shorteners, this is acceptable—URLs don't change frequently.

Real world

Bit.ly uses read replicas for user analytics queries while keeping redirects in cache. Large services often deploy 3-5 replicas per primary.

Capacity math

Each replica adds ~3000 read queries/sec capacity. At 900 RPS reads, you need at least 1 replica; at peak, 2-3 replicas recommended.

In the real world: Bit.ly uses read replicas for user analytics queries while keeping redirects in cache. Large services often deploy 3-5 replicas per primary.

Add a Worker Service for ID Generation

Pre-generate unique short codes in a background worker to avoid contention on the database during URL creation.

What it does

A background worker service pre-generates unique short codes (e.g., using base-62 encoding of a counter) and stores them in a pool. API servers consume from this pool during URL creation.

Why it matters

Generating unique IDs requires coordination. A naive counter has a single point of contention; a worker-based pool reduces lock contention and enables offline generation.

Trade-off

Requires an extra service and coordination mechanism. If ID generation fails, the pool depletes. Implement backpressure to stop accepting new URLs when the pool is low.

Real world

YouTube uses a distributed ID generation service (Snowflake-style). Uber generates IDs in batches to reduce coordination overhead.

Capacity math

A worker can pre-generate 100k IDs/sec. A pool of 1M pre-generated IDs provides ~15 seconds of runway at 1000 RPS.

In the real world: YouTube uses a distributed ID generation service (Snowflake-style). Uber generates IDs in batches to reduce coordination overhead.

Database Failure: Primary database goes down. Can users still access previously shortened URLs from cache?

§2Step 3 — Deep Dive

3Deep Dive

Redis is an in-memory cache that stores key-value pairs for nanosecond lookup times. For URL shorteners, it caches the mapping of short codes (e.g., 'abc123') to full URLs.

Approach	Read latency	Write complexity	Collision risk	Best for	Cost	Ops burden
Random ID (6 chars)	< 5ms with cache	Low	Low at small scale	Simple shorteners ✓	Low	Low
Base62 counter	< 5ms with cache	Low	None	Sequential, predictable IDs	Low	Low
MD5/SHA hash (truncated)	< 5ms with cache	Low	Yes, needs retry	Deduplication needed	Low	Low
Custom alias	< 5ms with cache	Medium	None (user-defined)	Branded short links	Low	Low
Snowflake ID	< 5ms with cache	High	None	Distributed, time-ordered	Low	Low

URL Shortener — storage and redirect strategy trade-offs.

typescriptURL Shortener — redirect with Redis cache + Postgres fallback

import { createClient } from 'redis'
import { Pool } from 'pg'

const redis = createClient()
const pg = new Pool({ connectionString: process.env.DATABASE_URL })

export async function redirect(shortCode: string): Promise<string | null> {
  // 1. Check Redis cache first (hot URLs serve in < 1ms)
  const cached = await redis.get(`url:${shortCode}`)
  if (cached) return cached

  // 2. Fallback to Postgres
  const { rows } = await pg.query(
    'SELECT original_url FROM urls WHERE short_code = $1',
    [shortCode]
  )
  if (!rows[0]) return null

  // 3. Populate cache with 24h TTL
  await redis.setEx(`url:${shortCode}`, 86400, rows[0].original_url)
  return rows[0].original_url
}

export async function shorten(originalUrl: string): Promise<string> {
  const shortCode = Math.random().toString(36).slice(2, 8)
  await pg.query(
    'INSERT INTO urls (short_code, original_url) VALUES ($1, $2)',
    [shortCode, originalUrl]
  )
  return shortCode
}

Component	Why Add It	Tradeoff
Redis Cache	URL expansion is a hot-path read.	Cache invalidation is necessary if a URL is modified or deleted.
Read Replica	A single Postgres instance tops out at ~3000 queries/sec depending on hardware.	Replication lag (typically < 10ms) means replicas serve slightly stale data.
Worker Service for ID Generation	Generating unique IDs requires coordination.	Requires an extra service and coordination mechanism.

Design decision tradeoffs

Database Failure

Primary database goes down. Can users still access previously shortened URLs from cache?

Cache Failure & Stampede

Redis cache crashes. Thousands of concurrent redirect requests hit the database simultaneously. Does the system handle the load spike without cascading failure?

Traffic Surge

Viral URL creates 10x traffic spike (10k RPS). Load balancer is overwhelmed. Does the system gracefully degrade or queue requests? What's the max sustainable load?

URL shortening has two operations: create (write a mapping) and expand (read a mapping). Reads dominate — 100:1 read-to-write ratio is typical. Add caching in front of the database.

The bottleneck is the database. At 1000 RPS (mostly reads), a single database will saturate. Add a read replica to distribute read traffic across multiple nodes.

Use Redis for hot URLs. 80% of traffic hits 20% of URLs. Cache the most-accessed mappings to serve redirects from memory in < 1ms instead of 10-50ms database queries.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Redis Cache	Redis is an in-memory cache that stores key-value pairs for nanosecond lookup times.	URL expansion is a hot-path read.
Read Replica	A read replica is a database copy that receives updates from the primary and serves read-only queries.	A single Postgres instance tops out at ~3000 queries/sec depending on hardware.
Worker Service for ID Generation	A background worker service pre-generates unique short codes (e.	Generating unique IDs requires coordination.

Key design decisions

If the interviewer asks to scale 10×: From prototype to planet-scale. Introduce consistent hashing to redistribute load as you add nodes — minimize cache/shard remapping.

10× Target10K RPSwhere your architecture must hold

What's next