Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
E-Commerce Platform
§1Step 2 — High-Level Design
Design a product catalog, shopping cart, and checkout system that survives Black Friday spikes.
Store cart sessions in Redis (fast read/write) and cache product catalog data. Connect cart service exclusively to Redis — no Postgres for cart reads.
Redis stores cart sessions as hashes keyed by session ID. Product catalog data is cached with TTL. Cart service reads/writes only Redis; Postgres is the product-of-record.
Cart operations happen on every page interaction (add item, update quantity, checkout). Postgres for every cart read would be a bottleneck; Redis serves these at < 1ms.
If Redis goes down, all cart sessions are lost. Mitigate with Redis persistence (AOF) or dual-write to Postgres for cart data. Sessions auto-recreate on next login.
Amazon uses DynamoDB (similar to Redis for this use case) for cart storage. eBay caches product listings in Memcached with 10-minute TTLs.
Redis stores 100K active carts at ~1KB each = 100MB — trivial. Product catalog cache for 1M SKUs at 500B each = 500MB, fits in a single Redis node.
When an order is placed, publish to Kafka. Consumers handle payment authorization, inventory reservation, and fulfillment initiation asynchronously.
Order service publishes an OrderPlaced event to Kafka. Multiple consumers process it: PaymentService charges the card, InventoryService reserves stock, EmailService sends confirmation.
Synchronous order processing (charge + deduct + notify in one request) is slow and brittle. Any one step failing rolls back the whole order. Kafka enables independent retries.
Async processing means the user sees 'Order Confirmed' before payment is actually charged. If payment fails later, you must send a cancellation — a worse UX than failing synchronously.
Amazon uses SQS/SNS for order processing fan-out. Shopify uses Kafka to process 1M+ orders/day through their fulfillment pipeline.
Kafka handles 1M+ order events/second. Order processing consumers can scale independently — payment service might need 10 consumers while email needs 2.
Move product images and videos to object storage with CDN delivery, eliminating media bandwidth from your application servers.
Product images are stored in object storage (S3). Product service returns image URLs in API responses. CDN delivers images directly from edge nodes near users.
Serving images through application servers wastes compute, bandwidth, and memory. Object storage + CDN scales to millions of concurrent media requests at near-zero origin load.
URL expiry adds complexity for private/restricted content. Public product images can use permanent CDN URLs; user-uploaded content (reviews, returns photos) needs signed URLs.
Etsy uses S3 + Fastly CDN for product images. Shopify serves all merchant product media through their CDN, processing 40TB+ of image uploads per day.
S3 handles unlimited objects. At 100K DAU viewing 20 products each, that's 2M product page loads × 10 images = 20M image requests/day. CDN handles this with 98%+ cache hits.
§2Step 3 — Deep Dive
Redis stores cart sessions as hashes keyed by session ID. Product catalog data is cached with TTL. Cart service reads/writes only Redis; Postgres is the product-of-record.
| Approach | Read latency | Durability | Scalability | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| Postgres only | 5–20ms | High | Medium | Small scale, ACID needed | Low | Low |
| Redis only | < 1ms | Low (TTL expiry) | High | Ephemeral sessions | Medium | Low |
| Redis + async Postgres sync | < 1ms read | High | High | Production e-commerce ✓ | Medium | Medium |
| Browser localStorage | < 1ms | Medium (browser) | Infinite | Guest checkout | Low | Low |
| DynamoDB | < 5ms | High | Very high | Global cart, AWS-native | High | Low |
Cart storage strategies — Redis wins for speed; dual-write for durability.
import { createClient } from 'redis'
import { Kafka } from 'kafkajs'
const redis = createClient()
const kafka = new Kafka({ brokers: ['kafka:9092'] })
const producer = kafka.producer()
interface CartItem { productId: string; qty: number; price: number }
export async function addToCart(sessionId: string, item: CartItem): Promise<void> {
const key = `cart:${sessionId}`
// Read current cart
const raw = await redis.get(key)
const cart: CartItem[] = raw ? JSON.parse(raw) : []
// Immutable update — replace existing item or append
const updated = cart.some(i => i.productId === item.productId)
? cart.map(i => i.productId === item.productId ? { ...i, qty: i.qty + item.qty } : i)
: [...cart, item]
// Write to Redis with 7-day TTL
await redis.setEx(key, 604800, JSON.stringify(updated))
// Async event for Postgres persistence (decoupled from hot path)
await producer.send({
topic: 'cart-updated',
messages: [{ key: sessionId, value: JSON.stringify({ sessionId, cart: updated }) }],
})
}
export async function checkout(sessionId: string, userId: string): Promise<string> {
const raw = await redis.get(`cart:${sessionId}`)
if (!raw) throw new Error('Cart not found')
const orderId = crypto.randomUUID()
await producer.send({
topic: 'order-placed',
messages: [{ key: orderId, value: JSON.stringify({ orderId, userId, cart: JSON.parse(raw) }) }],
})
await redis.del(`cart:${sessionId}`)
return orderId
}| Component | Why Add It | Tradeoff |
|---|---|---|
| Redis for Cart and Catalog Caching | Cart operations happen on every page interaction (add item, update quantity, checkout). | If Redis goes down, all cart sessions are lost. |
| Kafka for Async Order Processing | Synchronous order processing (charge + deduct + notify in one request) is slow and brittle. | Async processing means the user sees 'Order Confirmed' before payment is actually charged. |
| Object Storage for Product Media | Serving images through application servers wastes compute, bandwidth, and memory. | URL expiry adds complexity for private/restricted content. |
Design decision tradeoffs
Postgres goes down during a flash sale. Can product browsing continue from cache? What happens to in-flight orders?
Order service becomes unreachable from the API gateway (network partition). Incoming checkout requests timeout. Should the gateway fail-open (reject orders) or fail-closed (queue orders)? How do we prevent double-charging if the order service recovers?
Flash sale drives 10K concurrent product page requests to a single product. Product service CPU maxes out. All product browsing slows to 5+ seconds. What caching strategy prevents this? How do you handle cache invalidation when prices change mid-sale?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Redis for Cart and Catalog Caching | Redis stores cart sessions as hashes keyed by session ID. | Cart operations happen on every page interaction (add item, update quantity, checkout). |
| Kafka for Async Order Processing | Order service publishes an OrderPlaced event to Kafka. | Synchronous order processing (charge + deduct + notify in one request) is slow and brittle. |
| Object Storage for Product Media | Product images are stored in object storage (S3). | Serving images through application servers wastes compute, bandwidth, and memory. |
Key design decisions