Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Hotel Reservation System

intermediateTransactionsConcurrency

Commerce·50 min read

Hotel Reservation System

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

TransactionsConcurrency

§1Step 2 — High-Level Design

2High-Level Design

Handle concurrent booking with optimistic locking. Race conditions, overbooking prevention, and idempotency.

System architecture overview

Interactive diagram locked

Upgrade to Pro to build and run this system.

Stage 1 of 3Starting state — the problem to solve

Progressive build — add each component step by step

Interactive diagram locked

Upgrade to Pro to build and run this system.

Add Redis for Room Availability Cache

Add Redis to cache room availability so search queries don't hammer the database on every lookup.

What it does

Redis caches room availability data — which rooms are available on which dates — so search queries return instantly without database round-trips. The cache stores a map of hotel-date to available room count, updated in real-time whenever a booking or cancellation occurs. This reduces database load by 99% for availability searches, since they are far more frequent than actual bookings.

Why it matters

Hotel availability is read-heavy (100 searches per booking). Without caching, every search query hits Postgres, which must join rooms, reservations, and check availability for the requested dates. At 1000 bookings/second, this translates to 100K availability queries/second hitting the database. Redis serves these at microsecond latency, preventing database saturation during flash sales.

Trade-off

Cache invalidation is critical. When a room is booked, the cache must be updated atomically with the database write (write-through pattern). A window of stale availability can lead to double-booking — if cache shows a room available but it was just booked, two guests could both complete their bookings. The solution is atomic updates: INSERT into reservations (database) + DECR available_rooms (cache) in a single transactional unit.

Real world

Booking.com serves 1.5M+ room night searches/day using Redis for availability. Expedia caches hotel inventory with TTL-based refresh. Airbnb uses a custom availability calendar cache with per-listing bloom filters to track booked dates. All major platforms use caching to handle flash sales.

Capacity math

10K hotels × 365 days × 100 rooms × 8 bytes = 2.9GB Redis. Fits easily on one node. Updates: at 1K bookings/second, that's 1K Redis writes/second. Redis handles 100K+ operations/second, so this is 1% utilization.

In the real world: Booking.com serves 1.5M+ room night searches/day using Redis for availability. Expedia caches hotel inventory with TTL-based refresh. Airbnb uses a custom availability calendar cache with per-listing bloom filters to track booked dates. All major platforms use caching to handle flash sales.

Add a Message Queue for Booking Events

Add Kafka to propagate booking events to downstream systems: email confirmation, loyalty points, channel managers.

What it does

Kafka carries booking events to all downstream consumers: confirmation email workers, loyalty program updaters, OTA (Online Travel Agency) channel sync, and analytics pipelines. When a booking is successfully committed to the database, the Booking API publishes a BookingConfirmed event to Kafka, which contains the full booking details (guest, room, dates, price, payment confirmation). Multiple independent consumers subscribe to this event and process it asynchronously: the EmailService sends a confirmation email within seconds, the LoyaltyService credits points, the ChannelManager syncs the inventory to Booking.com and Expedia, and the Analytics Service logs the booking for reporting.

Why it matters

The booking API should complete the reservation atomically in Postgres and return to the client within 200ms. Sending email, updating loyalty points, and syncing with Expedia are slow external calls that don't need to be synchronous. Without async event propagation, if the email service is slow, all booking API responses would lag. Kafka decouples these concerns: the API writes to Postgres and publishes to Kafka (< 100ms total), then returns immediately while downstream workers handle the rest in the background.

Trade-off

Downstream consumers may lag behind bookings. If a hotel updates rates in the channel manager 5 seconds after a booking, the OTA channel might accept a new booking at the wrong rate if the Kafka message hasn't been consumed yet. This creates a window of inconsistency. The solution is event ordering: partition Kafka by hotel ID so all events for a hotel are processed in order. Additionally, implement idempotent consumption: if the same BookingConfirmed event is processed twice (due to Kafka retries), the handler should detect this via a unique event ID and skip the duplicate.

Real world

Marriott, Hilton, and IHG all use event-driven architectures for booking propagation. Channel managers (SiteMinder, RateGain, Cloudbeds) receive booking events via webhooks or message queues and sync inventory within seconds. Airbnb publishes booking events to Kafka for notifications, payments, guest reputation scoring, and calendar sync. Stripe's payments API triggers webhooks (a form of async event propagation) for order fulfillment.

Capacity math

1M bookings/day = 12/second average, 100/second peak during flash sales. Kafka handles this trivially with room to spare. A single Kafka broker can handle 1M+ messages/second. The message contains full booking details (~5KB), so 500KB/second peak throughput. A single partition can retain months of booking events.

In the real world: Marriott, Hilton, and IHG all use event-driven architectures for booking propagation. Channel managers (SiteMinder, RateGain, Cloudbeds) receive booking events via webhooks or message queues and sync inventory within seconds. Airbnb publishes booking events to Kafka for notifications, payments, guest reputation scoring, and calendar sync. Stripe's payments API triggers webhooks (a form of async event propagation) for order fulfillment.

Database Failure: Hotel database goes down during a flash sale. What happens to in-flight bookings? Should the booking API fall back to Redis cache and queue requests for retry? How do you prevent double-booking when the database comes back online?

§2Step 3 — Deep Dive

3Deep Dive

Strategy	Throughput	Race condition safe?	Complexity	Best for	Cost	Ops burden
SELECT FOR UPDATE (pessimistic)	~500 TPS	Yes	Low	Low-volume, strong guarantee	Low	Low
Optimistic locking (version field)	~5K TPS	Yes (retry on conflict)	Low	Medium traffic, low contention	Low	Low
Redis DECRBY + Lua	~50K TPS	Yes (atomic)	Medium	High-traffic hotel/seat booking ✓	Medium	Medium
Database trigger + constraint	~1K TPS	Yes	Low	DB-enforced, simple schema	Low	Low
Two-phase commit	~200 TPS	Yes	High	Multi-DB, distributed inventory	Medium	High

Overbooking prevention strategies — Redis atomic decrement wins on throughput.

pythonHotel booking — Redis Lua atomic availability check + reserve

import redis
r = redis.Redis()

RESERVE_SCRIPT = """
local key      = KEYS[1]
local rooms    = tonumber(redis.call('GET', key) or 0)
local quantity = tonumber(ARGV[1])
if rooms >= quantity then
    redis.call('DECRBY', key, quantity)
    return 1
else
    return 0
end
"""

def check_and_reserve(hotel_id: str, room_type: str,
                      date: str, quantity: int = 1) -> bool:
    key = f"avail:{hotel_id}:{room_type}:{date}"
    script = r.register_script(RESERVE_SCRIPT)
    return bool(script(keys=[key], args=[quantity]))

def initialize_availability(hotel_id: str, room_type: str,
                             date: str, total_rooms: int):
    r.set(f"avail:{hotel_id}:{room_type}:{date}", total_rooms, ex=90*86400)

initialize_availability("42", "standard", "2024-12-25", 50)
success = check_and_reserve("42", "standard", "2024-12-25")
print("Reserved!" if success else "Sold out!")

Component	Why Add It	Tradeoff
Redis for Room Availability Cache	Hotel availability is read-heavy (100 searches per booking).	Cache invalidation is critical.
Message Queue for Booking Events	The booking API should complete the reservation atomically in Postgres and return to the client within 200ms.	Downstream consumers may lag behind bookings.

Design decision tradeoffs

Database Failure

Hotel database goes down during a flash sale. What happens to in-flight bookings? Should the booking API fall back to Redis cache and queue requests for retry? How do you prevent double-booking when the database comes back online?

Booking API Isolation

Booking API becomes unreachable from clients (network partition). Clients retry booking requests. When the API reconnects, it must reconcile which bookings were actually committed to the database vs. which were only attempted. How do you handle duplicate booking attempts without double-booking a room?

Database Saturation During Flash Sale

Flash sale triggers 1000 concurrent booking requests for 10 available rooms. Database connection pool is exhausted, queries queue and timeout. How does the availability cache prevent this? What happens if the cache is stale and shows 100 available rooms when only 10 remain?

Cache Stampede on Availability Update

A room is booked, invalidating its cache entry for hotel:{hotelId}:rooms:{date}. Simultaneously, 1000 requests arrive for availability. All 1000 requests miss the cache and hit the database at once, causing a thundering herd. How do you prevent this spike?

Availability check and reservation must be atomic. Two requests for the same room can both pass the availability check if the check and insert are not atomic. Use SELECT FOR UPDATE to lock the room row during the transaction.

For flash sales: queue the booking requests through a Redis list (RPUSH / BLPOP). A reservation worker dequeues one at a time and processes serially for each room. This serializes access without database locks.

Optimistic locking: add a version column to the rooms table. When booking: SELECT version, then UPDATE WHERE version = {read_version}. If no rows updated (version changed), another booking won the race — retry. Zero double-bookings guaranteed.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Redis for Room Availability Cache	Redis caches room availability data — which rooms are available on which dates — so search queries return instantly without database round-trips.	Hotel availability is read-heavy (100 searches per booking).
Message Queue for Booking Events	Kafka carries booking events to all downstream consumers: confirmation email workers, loyalty program updaters, OTA (Online Travel Agency) channel sync, and analytics pipelines.	The booking API should complete the reservation atomically in Postgres and return to the client within 200ms.

Key design decisions

If the interviewer asks to scale 10×: 10x the load — architectural moves that work. Identify the single bottleneck (usually the database write path) and address it first before horizontal scaling.

10× Target10K RPSwhere your architecture must hold

What's next