Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Proximity Service

beginnerGeospatialSearch

Search·35 min read

Proximity Service

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

GeospatialSearch

§1Step 2 — High-Level Design

2High-Level Design

Find nearby businesses with geohash and quadtree indexing. Design Yelp's core search backend.

System architecture overview

Stage 1 of 5Starting state — the problem to solve

Progressive build — add each component step by step

Add Redis for Geospatial Queries

Connect Redis to the API using its native geospatial commands (GEOADD, GEORADIUS) to find nearby places.

What it does

Redis has native geospatial data types that store latitude/longitude pairs and support radius queries (GEORADIUS, GEOSEARCH) in O(N+log M) time.

Why it matters

Finding places within 1km of the user requires geospatial indexing. Doing this in Postgres with a simple lat/lng range query misses the Earth's curvature and doesn't scale. Redis GEORADIUS uses a geohash-encoded sorted set for efficient proximity queries.

Trade-off

Redis geospatial data lives in memory — suitable for hot data (restaurants near me) but not full datasets. For cold geospatial data, PostGIS in Postgres is better.

Real world

Uber uses Redis for real-time driver location lookups. Yelp uses Redis for "restaurants near me" queries. Lyft's dispatch system uses Redis geospatial for driver matching.

Capacity math

Redis can store 1B+ geo points per node. GEORADIUS returning 100 nearby results runs in < 1ms even with 100M stored locations.

In the real world: Uber uses Redis for real-time driver location lookups. Yelp uses Redis for "restaurants near me" queries. Lyft's dispatch system uses Redis geospatial for driver matching.

Add Postgres for Place Data

Connect Postgres (with PostGIS extension) to store the full place database — names, addresses, categories, ratings.

What it does

Postgres with the PostGIS extension stores the full, durable place database and supports SQL queries with geographic operators for complex spatial queries.

Why it matters

Proximity queries need two layers: fast 'what IDs are near me?' (Redis) and slower 'fetch the full data for these IDs' (Postgres). Postgres stores the source of truth — names, hours, photos, ratings — that Redis doesn't need to cache.

Trade-off

PostGIS queries are more powerful than Redis geo (supports polygons, road network routing) but slower for simple radius queries. Use Redis for the hot path, Postgres for complex geospatial analytics.

Real world

Yelp uses Postgres for place data. Foursquare uses Postgres with PostGIS for venue data. Google Maps uses both in-memory geo indexes and durable databases.

Capacity math

Postgres handles 10M+ place records easily. PostGIS spatial index (GIST) on lat/lng makes WITHIN radius queries run in < 10ms even on 100M rows.

In the real world: Yelp uses Postgres for place data. Foursquare uses Postgres with PostGIS for venue data. Google Maps uses both in-memory geo indexes and durable databases.

Add a Cache for Geo Queries

At high traffic, cache frequent geo proximity queries so you're not running expensive spatial index scans on every request.

What it does

A Redis cache stores the results of proximity queries, keyed by geohash cell and search radius.

Why it matters

Spatial index scans are expensive. Users in the same neighborhood search for the same results — a cache serves them in <1ms.

Trade-off

Cached results may miss newly added locations. Use short TTLs (30-60s) for frequently changing datasets.

Real world

Uber uses Redis with geohash-based caching for their supply positioning. Yelp caches business search results per area.

Capacity math

Redis GEO commands handle 100K+ proximity queries per second. One 8GB node caches millions of location results.

In the real world: Uber uses Redis with geohash-based caching for their supply positioning. Yelp caches business search results per area.

Add a Load Balancer

At peak, distribute geo query traffic across multiple API nodes behind a load balancer.

What it does

A load balancer routes geo proximity requests across multiple stateless API server instances.

Why it matters

At peak (e.g., event discovery during rush hour), search volume spikes 10x. Horizontal scaling handles this without database pressure.

Trade-off

Round-robin load balancing works well here since nodes are fully stateless. Sticky sessions are not needed.

Real world

Google Maps, Yelp, and Foursquare all run horizontally scaled proximity APIs behind load balancers.

Capacity math

Each API node handles 5-10K geo queries/second. Three nodes = 15-30K queries/second.

In the real world: Google Maps, Yelp, and Foursquare all run horizontally scaled proximity APIs behind load balancers.

Surge in Location Queries: A local event (stadium opening) causes 100x normal proximity queries in a 1km radius. The geo index (geo-1) is saturated. How do you cache proximity results for popular areas and implement request coalescing to prevent redundant queries?

§2Step 3 — Deep Dive

3Deep Dive

Redis has native geospatial data types that store latitude/longitude pairs and support radius queries (GEORADIUS, GEOSEARCH) in O(N+log M) time.

Strategy	Query type	Precision control	Range query	Best for	Cost	Ops burden
Redis GEOINDEX (Geohash)	Radius search	Geohash precision 1–12	Yes (GEORADIUS)	Drivers, restaurants, real-time ✓	Medium	Low
S2 Cells (Google)	Region/radius	Level 1–30 (cm precision)	Yes	Google Maps, large-scale routing	Low	Medium
PostGIS	Any shape	Full float precision	Yes (ST_DWithin)	Complex polygons, analytics	Medium	Medium
H3 Hexagons (Uber)	Hex grid aggregation	Level 0–15	Yes	Surge pricing, heatmaps	Low	Medium
QuadTree	Bounding box	Tree depth	Yes	Spatial databases, game engines	Low	Medium

Geospatial indexing strategies — Geohash and S2 are the production standards.

pythonRedis GEO — add driver location and search nearby drivers

import redis

r = redis.Redis(host='redis-primary', decode_responses=True)

def update_driver_location(driver_id: str, lon: float, lat: float, city: str):
    # GEOADD — O(log n) — stores as Geohash internally
    r.geoadd(f"drivers:{city}", (lon, lat, driver_id))
    # TTL: driver offline after 30s without update
    r.expire(f"drivers:{city}", 30)

def find_nearby_drivers(lon: float, lat: float, city: str,
                        radius_km: float = 2.0, limit: int = 10) -> list:
    # GEORADIUS — returns drivers sorted by distance
    results = r.georadius(
        f"drivers:{city}",
        lon, lat,
        radius_km, unit='km',
        withcoord=True,
        withdist=True,
        count=limit,
        sort='ASC'
    )
    return [
        {'driver_id': r[0], 'distance_km': r[1], 'coords': r[2]}
        for r in results
    ]

# Usage: user at (lon=-73.98, lat=40.75) in NYC wants a ride
nearby = find_nearby_drivers(-73.98, 40.75, city='nyc', radius_km=2.0)
# Returns: [{'driver_id': 'd42', 'distance_km': 0.3, ...}, ...]

Component	Why Add It	Tradeoff
Redis for Geospatial Queries	Finding places within 1km of the user requires geospatial indexing.	Redis geospatial data lives in memory — suitable for hot data (restaurants near me) but not full datasets.
Postgres for Place Data	Proximity queries need two layers: fast 'what IDs are near me?	PostGIS queries are more powerful than Redis geo (supports polygons, road network routing) but slower for simple radius queries.
Cache for Geo Queries	Spatial index scans are expensive.	Cached results may miss newly added locations.
Load Balancer	At peak (e.	Round-robin load balancing works well here since nodes are fully stateless.

Design decision tradeoffs

Surge in Location Queries

A local event (stadium opening) causes 100x normal proximity queries in a 1km radius. The geo index (geo-1) is saturated. How do you cache proximity results for popular areas and implement request coalescing to prevent redundant queries?

Geographic Hot Spot

All queries are concentrated in one city block during rush hour, creating a hot partition on geo-1's geographic cell. How do you detect hot cells and split them into finer-grained sub-cells or route to additional replicas?

Proximity Cache Eviction

cache-1 runs out of memory and evicts all cached proximity results. Every subsequent proximity query falls through to geo-1, overwhelming it. How do you implement graceful degradation and cache capacity planning?

Redis Geo commands (GEOADD, GEORADIUS) use a geohash internally. Each driver location is stored as: GEOADD city:drivers longitude latitude driverID. Add a Redis node.

GEORADIUS city:drivers lon lat 2 km ASC returns all drivers within 2km, sorted by distance. This is an O(N+log(M)) operation — extremely fast for typical driver densities.

Redis stores live locations (updated every 5s, TTL=30s for offline detection). Postgres stores the historical location trail for trip records and audit. Connect: API → Redis (live queries), API → Postgres (history writes).

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Redis for Geospatial Queries	Redis has native geospatial data types that store latitude/longitude pairs and support radius queries (GEORADIUS, GEOSEARCH) in O(N+log M) time.	Finding places within 1km of the user requires geospatial indexing.
Postgres for Place Data	Postgres with the PostGIS extension stores the full, durable place database and supports SQL queries with geographic operators for complex spatial queries.	Proximity queries need two layers: fast 'what IDs are near me?
Cache for Geo Queries	A Redis cache stores the results of proximity queries, keyed by geohash cell and search radius.	Spatial index scans are expensive.
Load Balancer	A load balancer routes geo proximity requests across multiple stateless API server instances.	At peak (e.

Key design decisions

If the interviewer asks to scale 10×: Keep search fast as the index grows. Shard the index by time window or entity type — each shard stays small, queries stay fast.

10× Target50K RPSwhere your architecture must hold

What's next