Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Ride-Sharing Backend

intermediateGeospatialRealtime

Marketplace·60 min read

Ride-Sharing Backend

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

GeospatialRealtime

§1Step 2 — High-Level Design

2High-Level Design

Match riders and drivers in real-time. Geospatial indexing, supply/demand, and driver location streaming.

System architecture overview

Interactive diagram locked

Upgrade to Pro to build and run this system.

Stage 1 of 5Starting state — the problem to solve

Progressive build — add each component step by step

Interactive diagram locked

Upgrade to Pro to build and run this system.

Add an API Gateway

Place an API Gateway to handle both driver and passenger app traffic, routing to the appropriate backend services.

What it does

The API Gateway routes traffic from driver and passenger mobile apps to the appropriate backend services, handling authentication and protocol routing.

Why it matters

Driver and passenger apps make different types of requests: location updates (high frequency, small payload), ride requests (low frequency, needs matching), and ride status (polling). The gateway routes each type to the optimized backend.

Trade-off

The gateway becomes critical infrastructure — all app traffic passes through it. Run 3+ instances in multiple AZs. A 100ms gateway outage means 0 rides being matched.

Real world

Uber uses Envoy as their API gateway. Lyft uses Kong. Both handle millions of requests/second from driver and passenger apps globally.

Capacity math

At peak, 1M active drivers sending location updates every 5 seconds = 200K requests/second to the location service. A 3-node API gateway cluster handles this.

In the real world: Uber uses Envoy as their API gateway. Lyft uses Kong. Both handle millions of requests/second from driver and passenger apps globally.

Add Redis for Driver Location

Add Redis with geospatial commands to store real-time driver locations for fast proximity matching.

What it does

Redis stores real-time driver locations using geospatial commands and enables proximity queries to find available drivers near a passenger.

Why it matters

Finding the nearest available driver requires querying thousands of moving GPS coordinates. Redis GEORADIUS returns all drivers within 5km radius in O(N+log M) time — under 5ms for 100K drivers.

Trade-off

Redis geospatial uses WGS84 coordinates — accurate enough for proximity matching. Positions are stored as 52-bit integers (~ 0.6mm precision). Drivers update every 5 seconds so location is stale by up to 5 seconds.

Real world

Uber uses Redis geospatial (H3 indexed) for dispatch. Lyft uses Redis for real-time driver location. DoorDash uses Redis for dasher proximity matching. All major gig economy apps use this pattern.

Capacity math

1M active drivers × 20 bytes per location = 20MB Redis. GEOADD (location update): 200K/second from 1M drivers updating every 5s. GEORADIUS queries: ~10K/second from passenger app requests.

In the real world: Uber uses Redis geospatial (H3 indexed) for dispatch. Lyft uses Redis for real-time driver location. DoorDash uses Redis for dasher proximity matching. All major gig economy apps use this pattern.

Add Postgres for Trips and Users

Add Postgres to store trip records, user profiles, payment history, and driver/passenger state.

What it does

Postgres stores the durable state of all rides, user accounts, driver profiles, and payment records — the business data that must survive beyond in-memory caches.

Why it matters

Redis holds live driver locations (ephemeral, high-frequency). Postgres holds trip history (durable, lower-frequency). A completed trip must be durably stored for billing, disputes, and compliance.

Trade-off

Postgres write throughput limits surge capacity. At 10K trip starts/second (massive surge), Postgres must handle 10K INSERT/second — feasible but requires connection pooling (PgBouncer) and SSD-backed storage.

Real world

Uber uses MySQL for trip data. Lyft uses PostgreSQL. Both shard by driver_id or city for horizontal scaling. Trip records are immutable once completed — append-only for simplicity.

Capacity math

10M trips/day × 1KB per trip record = 10GB/day. Partition by date; after 90 days, archive to Redshift for analytics. Active trip state (< 1M concurrent) fits in 1GB.

In the real world: Uber uses MySQL for trip data. Lyft uses PostgreSQL. Both shard by driver_id or city for horizontal scaling. Trip records are immutable once completed — append-only for simplicity.

Add Worker Services for Matching

Add matching engine workers that run the driver-passenger matching algorithm in the background.

What it does

Matching workers run optimization algorithms that pair ride requests with the optimal available driver based on proximity, ETA, driver rating, and surge pricing.

Why it matters

Matching is computationally intensive — comparing N passengers against M drivers with multi-factor optimization. Running this synchronously in the request path adds 500ms-2s latency. Workers run it asynchronously every few seconds.

Trade-off

Async matching introduces delay (1-5 seconds from request to match). Synchronous matching is faster but blocks the API under load. Uber's dispatch runs in < 2 seconds including matching.

Real world

Uber's dispatch system uses a proprietary matching algorithm running on dedicated worker fleets. Lyft uses Python workers. Both balance match quality vs. match latency.

Capacity math

At 100K active ride requests and 1M available drivers, the matching algorithm runs graph optimization. Uber's matching worker handles one city per worker — 100+ cities = 100+ workers.

In the real world: Uber's dispatch system uses a proprietary matching algorithm running on dedicated worker fleets. Lyft uses Python workers. Both balance match quality vs. match latency.

Load Balancer Failure: lb-1 crashes. All driver and passenger apps lose the gateway. How do you implement DNS-based failover, multi-AZ load balancers, and client retry logic to reconnect within 10 seconds?

§2Step 3 — Deep Dive

3Deep Dive

The API Gateway routes traffic from driver and passenger mobile apps to the appropriate backend services, handling authentication and protocol routing.

Approach	Location update	Nearby search	Surge support	Best for	Cost	Ops burden
Redis GEO (Geohash)	O(log n) per update	O(n+k) GEORADIUS	Aggregate by area	Driver matching ✓	Medium	Low
H3 Hexagons (Uber)	O(1) cell lookup	O(1) neighbor cells	Native (cell aggregation)	Surge pricing, heatmaps	Low	Medium
PostGIS	O(log n)	O(log n) ST_DWithin	Yes, with aggregations	Complex geo queries	Medium	Medium
S2 Cells (Google)	O(1) cell lookup	O(1) parent cells	Yes	Google Maps, routing	Low	Medium
QuadTree (custom)	O(log n)	O(log n)	Yes	Game engines, custom	Low	Medium

Geospatial matching for ride-sharing — H3 hexagons and Redis GEO are the standard.

pythonRide matching — driver location update + nearest-driver search

import redis, json
from typing import Optional
r = redis.Redis(decode_responses=True)

DRIVER_TTL = 30
SEARCH_KM  = 5

def update_driver_location(driver_id: str, city: str,
                           lat: float, lon: float, available: bool = True):
    if available:
        r.geoadd(f"drivers:{city}:available", (lon, lat, driver_id))
    else:
        r.zrem(f"drivers:{city}:available", driver_id)
    r.setex(f"driver:{driver_id}", DRIVER_TTL, json.dumps({'lat': lat, 'lon': lon}))

def find_nearest_driver(city: str, lat: float, lon: float) -> Optional[str]:
    results = r.georadius(
        f"drivers:{city}:available",
        lon, lat, SEARCH_KM, unit='km',
        withdist=True, count=1, sort='ASC'
    )
    if not results:
        return None
    driver_id, _ = results[0]
    if not r.exists(f"driver:{driver_id}"):
        r.zrem(f"drivers:{city}:available", driver_id)
        return find_nearest_driver(city, lat, lon)
    return driver_id

Component	Why Add It	Tradeoff
API Gateway	Driver and passenger apps make different types of requests: location updates (high frequency, small payload), ride requests (low frequency, needs matching), and ride status (polling).	The gateway becomes critical infrastructure — all app traffic passes through it.
Redis for Driver Location	Finding the nearest available driver requires querying thousands of moving GPS coordinates.	Redis geospatial uses WGS84 coordinates — accurate enough for proximity matching.
Postgres for Trips and Users	Redis holds live driver locations (ephemeral, high-frequency).	Postgres write throughput limits surge capacity.
Worker Services for Matching	Matching is computationally intensive — comparing N passengers against M drivers with multi-factor optimization.	Async matching introduces delay (1-5 seconds from request to match).

Design decision tradeoffs

Load Balancer Failure

lb-1 crashes. All driver and passenger apps lose the gateway. How do you implement DNS-based failover, multi-AZ load balancers, and client retry logic to reconnect within 10 seconds?

Rush-Hour Geo Hot Spot

During 6 PM rush hour, 100K drivers and riders concentrate in a 1km2 downtown area, overwhelming the geo index for that region. GEOADD and GEORADIUS ops queue up. How do you shard the geo index by geographic cell and route hot cells to dedicated Redis instances?

Driver Location Cache Partition

API servers lose network access to cache-1 (Redis geo). Drivers send location updates but they can't be indexed. Riders can't find nearby drivers. How do you implement local fallback (use stale locations), reconnection logic, and graceful degradation to 'no drivers nearby' instead of errors?

Drivers send GPS updates every 5 seconds via WebSocket. Each update: GEOADD drivers:cityID longitude latitude driverID in Redis (with TTL=30s). This stores live positions efficiently.

On passenger ride request: Matching Engine calls GEORADIUS drivers:cityID lon lat 2 km ASC COUNT 10 to get nearest 10 available drivers. Filter by availability status. Send offer to top 3 simultaneously.

Real-time updates to passenger: driver location updates pushed via WebSocket every 5s during a ride. WebSocket gateway looks up the passenger's connection from Redis (passengerID → connectionID) and pushes the update directly.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
API Gateway	The API Gateway routes traffic from driver and passenger mobile apps to the appropriate backend services, handling authentication and protocol routing.	Driver and passenger apps make different types of requests: location updates (high frequency, small payload), ride requests (low frequency, needs matching), and ride status (polling).
Redis for Driver Location	Redis stores real-time driver locations using geospatial commands and enables proximity queries to find available drivers near a passenger.	Finding the nearest available driver requires querying thousands of moving GPS coordinates.
Postgres for Trips and Users	Postgres stores the durable state of all rides, user accounts, driver profiles, and payment records — the business data that must survive beyond in-memory caches.	Redis holds live driver locations (ephemeral, high-frequency).
Worker Services for Matching	Matching workers run optimization algorithms that pair ride requests with the optimal available driver based on proximity, ETA, driver rating, and surge pricing.	Matching is computationally intensive — comparing N passengers against M drivers with multi-factor optimization.

Key design decisions

If the interviewer asks to scale 10×: 10x the load — architectural moves that work. Identify the single bottleneck (usually the database write path) and address it first before horizontal scaling.

10× Target6.0M RPSwhere your architecture must hold

What's next

Proximity Service

35 min read

Realtime

Real-Time Leaderboard

35 min read