Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Hotel Reservation System
§1Step 2 — High-Level Design
Handle concurrent booking with optimistic locking. Race conditions, overbooking prevention, and idempotency.
Interactive diagram locked
Upgrade to Pro to build and run this system.
Interactive diagram locked
Upgrade to Pro to build and run this system.
Add Redis to cache room availability so search queries don't hammer the database on every lookup.
Redis caches room availability data — which rooms are available on which dates — so search queries return instantly without database round-trips. The cache stores a map of hotel-date to available room count, updated in real-time whenever a booking or cancellation occurs. This reduces database load by 99% for availability searches, since they are far more frequent than actual bookings.
Hotel availability is read-heavy (100 searches per booking). Without caching, every search query hits Postgres, which must join rooms, reservations, and check availability for the requested dates. At 1000 bookings/second, this translates to 100K availability queries/second hitting the database. Redis serves these at microsecond latency, preventing database saturation during flash sales.
Cache invalidation is critical. When a room is booked, the cache must be updated atomically with the database write (write-through pattern). A window of stale availability can lead to double-booking — if cache shows a room available but it was just booked, two guests could both complete their bookings. The solution is atomic updates: INSERT into reservations (database) + DECR available_rooms (cache) in a single transactional unit.
Booking.com serves 1.5M+ room night searches/day using Redis for availability. Expedia caches hotel inventory with TTL-based refresh. Airbnb uses a custom availability calendar cache with per-listing bloom filters to track booked dates. All major platforms use caching to handle flash sales.
10K hotels × 365 days × 100 rooms × 8 bytes = 2.9GB Redis. Fits easily on one node. Updates: at 1K bookings/second, that's 1K Redis writes/second. Redis handles 100K+ operations/second, so this is 1% utilization.
Add Kafka to propagate booking events to downstream systems: email confirmation, loyalty points, channel managers.
Kafka carries booking events to all downstream consumers: confirmation email workers, loyalty program updaters, OTA (Online Travel Agency) channel sync, and analytics pipelines. When a booking is successfully committed to the database, the Booking API publishes a BookingConfirmed event to Kafka, which contains the full booking details (guest, room, dates, price, payment confirmation). Multiple independent consumers subscribe to this event and process it asynchronously: the EmailService sends a confirmation email within seconds, the LoyaltyService credits points, the ChannelManager syncs the inventory to Booking.com and Expedia, and the Analytics Service logs the booking for reporting.
The booking API should complete the reservation atomically in Postgres and return to the client within 200ms. Sending email, updating loyalty points, and syncing with Expedia are slow external calls that don't need to be synchronous. Without async event propagation, if the email service is slow, all booking API responses would lag. Kafka decouples these concerns: the API writes to Postgres and publishes to Kafka (< 100ms total), then returns immediately while downstream workers handle the rest in the background.
Downstream consumers may lag behind bookings. If a hotel updates rates in the channel manager 5 seconds after a booking, the OTA channel might accept a new booking at the wrong rate if the Kafka message hasn't been consumed yet. This creates a window of inconsistency. The solution is event ordering: partition Kafka by hotel ID so all events for a hotel are processed in order. Additionally, implement idempotent consumption: if the same BookingConfirmed event is processed twice (due to Kafka retries), the handler should detect this via a unique event ID and skip the duplicate.
Marriott, Hilton, and IHG all use event-driven architectures for booking propagation. Channel managers (SiteMinder, RateGain, Cloudbeds) receive booking events via webhooks or message queues and sync inventory within seconds. Airbnb publishes booking events to Kafka for notifications, payments, guest reputation scoring, and calendar sync. Stripe's payments API triggers webhooks (a form of async event propagation) for order fulfillment.
1M bookings/day = 12/second average, 100/second peak during flash sales. Kafka handles this trivially with room to spare. A single Kafka broker can handle 1M+ messages/second. The message contains full booking details (~5KB), so 500KB/second peak throughput. A single partition can retain months of booking events.
§2Step 3 — Deep Dive
Redis caches room availability data — which rooms are available on which dates — so search queries return instantly without database round-trips. The cache stores a map of hotel-date to available room count, updated in real-time whenever a booking or cancellation occurs. This reduces database load by 99% for availability searches, since they are far more frequent than actual bookings.
| Strategy | Throughput | Race condition safe? | Complexity | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| SELECT FOR UPDATE (pessimistic) | ~500 TPS | Yes | Low | Low-volume, strong guarantee | Low | Low |
| Optimistic locking (version field) | ~5K TPS | Yes (retry on conflict) | Low | Medium traffic, low contention | Low | Low |
| Redis DECRBY + Lua | ~50K TPS | Yes (atomic) | Medium | High-traffic hotel/seat booking ✓ | Medium | Medium |
| Database trigger + constraint | ~1K TPS | Yes | Low | DB-enforced, simple schema | Low | Low |
| Two-phase commit | ~200 TPS | Yes | High | Multi-DB, distributed inventory | Medium | High |
Overbooking prevention strategies — Redis atomic decrement wins on throughput.
import redis
r = redis.Redis()
RESERVE_SCRIPT = """
local key = KEYS[1]
local rooms = tonumber(redis.call('GET', key) or 0)
local quantity = tonumber(ARGV[1])
if rooms >= quantity then
redis.call('DECRBY', key, quantity)
return 1
else
return 0
end
"""
def check_and_reserve(hotel_id: str, room_type: str,
date: str, quantity: int = 1) -> bool:
key = f"avail:{hotel_id}:{room_type}:{date}"
script = r.register_script(RESERVE_SCRIPT)
return bool(script(keys=[key], args=[quantity]))
def initialize_availability(hotel_id: str, room_type: str,
date: str, total_rooms: int):
r.set(f"avail:{hotel_id}:{room_type}:{date}", total_rooms, ex=90*86400)
initialize_availability("42", "standard", "2024-12-25", 50)
success = check_and_reserve("42", "standard", "2024-12-25")
print("Reserved!" if success else "Sold out!")| Component | Why Add It | Tradeoff |
|---|---|---|
| Redis for Room Availability Cache | Hotel availability is read-heavy (100 searches per booking). | Cache invalidation is critical. |
| Message Queue for Booking Events | The booking API should complete the reservation atomically in Postgres and return to the client within 200ms. | Downstream consumers may lag behind bookings. |
Design decision tradeoffs
Hotel database goes down during a flash sale. What happens to in-flight bookings? Should the booking API fall back to Redis cache and queue requests for retry? How do you prevent double-booking when the database comes back online?
Booking API becomes unreachable from clients (network partition). Clients retry booking requests. When the API reconnects, it must reconcile which bookings were actually committed to the database vs. which were only attempted. How do you handle duplicate booking attempts without double-booking a room?
Flash sale triggers 1000 concurrent booking requests for 10 available rooms. Database connection pool is exhausted, queries queue and timeout. How does the availability cache prevent this? What happens if the cache is stale and shows 100 available rooms when only 10 remain?
A room is booked, invalidating its cache entry for hotel:{hotelId}:rooms:{date}. Simultaneously, 1000 requests arrive for availability. All 1000 requests miss the cache and hit the database at once, causing a thundering herd. How do you prevent this spike?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Redis for Room Availability Cache | Redis caches room availability data — which rooms are available on which dates — so search queries return instantly without database round-trips. | Hotel availability is read-heavy (100 searches per booking). |
| Message Queue for Booking Events | Kafka carries booking events to all downstream consumers: confirmation email workers, loyalty program updaters, OTA (Online Travel Agency) channel sync, and analytics pipelines. | The booking API should complete the reservation atomically in Postgres and return to the client within 200ms. |
Key design decisions