Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Proximity Service
§1Step 2 — High-Level Design
Find nearby businesses with geohash and quadtree indexing. Design Yelp's core search backend.
Connect Redis to the API using its native geospatial commands (GEOADD, GEORADIUS) to find nearby places.
Redis has native geospatial data types that store latitude/longitude pairs and support radius queries (GEORADIUS, GEOSEARCH) in O(N+log M) time.
Finding places within 1km of the user requires geospatial indexing. Doing this in Postgres with a simple lat/lng range query misses the Earth's curvature and doesn't scale. Redis GEORADIUS uses a geohash-encoded sorted set for efficient proximity queries.
Redis geospatial data lives in memory — suitable for hot data (restaurants near me) but not full datasets. For cold geospatial data, PostGIS in Postgres is better.
Uber uses Redis for real-time driver location lookups. Yelp uses Redis for "restaurants near me" queries. Lyft's dispatch system uses Redis geospatial for driver matching.
Redis can store 1B+ geo points per node. GEORADIUS returning 100 nearby results runs in < 1ms even with 100M stored locations.
Connect Postgres (with PostGIS extension) to store the full place database — names, addresses, categories, ratings.
Postgres with the PostGIS extension stores the full, durable place database and supports SQL queries with geographic operators for complex spatial queries.
Proximity queries need two layers: fast 'what IDs are near me?' (Redis) and slower 'fetch the full data for these IDs' (Postgres). Postgres stores the source of truth — names, hours, photos, ratings — that Redis doesn't need to cache.
PostGIS queries are more powerful than Redis geo (supports polygons, road network routing) but slower for simple radius queries. Use Redis for the hot path, Postgres for complex geospatial analytics.
Yelp uses Postgres for place data. Foursquare uses Postgres with PostGIS for venue data. Google Maps uses both in-memory geo indexes and durable databases.
Postgres handles 10M+ place records easily. PostGIS spatial index (GIST) on lat/lng makes WITHIN radius queries run in < 10ms even on 100M rows.
At high traffic, cache frequent geo proximity queries so you're not running expensive spatial index scans on every request.
A Redis cache stores the results of proximity queries, keyed by geohash cell and search radius.
Spatial index scans are expensive. Users in the same neighborhood search for the same results — a cache serves them in <1ms.
Cached results may miss newly added locations. Use short TTLs (30-60s) for frequently changing datasets.
Uber uses Redis with geohash-based caching for their supply positioning. Yelp caches business search results per area.
Redis GEO commands handle 100K+ proximity queries per second. One 8GB node caches millions of location results.
At peak, distribute geo query traffic across multiple API nodes behind a load balancer.
A load balancer routes geo proximity requests across multiple stateless API server instances.
At peak (e.g., event discovery during rush hour), search volume spikes 10x. Horizontal scaling handles this without database pressure.
Round-robin load balancing works well here since nodes are fully stateless. Sticky sessions are not needed.
Google Maps, Yelp, and Foursquare all run horizontally scaled proximity APIs behind load balancers.
Each API node handles 5-10K geo queries/second. Three nodes = 15-30K queries/second.
§2Step 3 — Deep Dive
Redis has native geospatial data types that store latitude/longitude pairs and support radius queries (GEORADIUS, GEOSEARCH) in O(N+log M) time.
| Strategy | Query type | Precision control | Range query | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| Redis GEOINDEX (Geohash) | Radius search | Geohash precision 1–12 | Yes (GEORADIUS) | Drivers, restaurants, real-time ✓ | Medium | Low |
| S2 Cells (Google) | Region/radius | Level 1–30 (cm precision) | Yes | Google Maps, large-scale routing | Low | Medium |
| PostGIS | Any shape | Full float precision | Yes (ST_DWithin) | Complex polygons, analytics | Medium | Medium |
| H3 Hexagons (Uber) | Hex grid aggregation | Level 0–15 | Yes | Surge pricing, heatmaps | Low | Medium |
| QuadTree | Bounding box | Tree depth | Yes | Spatial databases, game engines | Low | Medium |
Geospatial indexing strategies — Geohash and S2 are the production standards.
import redis
r = redis.Redis(host='redis-primary', decode_responses=True)
def update_driver_location(driver_id: str, lon: float, lat: float, city: str):
# GEOADD — O(log n) — stores as Geohash internally
r.geoadd(f"drivers:{city}", (lon, lat, driver_id))
# TTL: driver offline after 30s without update
r.expire(f"drivers:{city}", 30)
def find_nearby_drivers(lon: float, lat: float, city: str,
radius_km: float = 2.0, limit: int = 10) -> list:
# GEORADIUS — returns drivers sorted by distance
results = r.georadius(
f"drivers:{city}",
lon, lat,
radius_km, unit='km',
withcoord=True,
withdist=True,
count=limit,
sort='ASC'
)
return [
{'driver_id': r[0], 'distance_km': r[1], 'coords': r[2]}
for r in results
]
# Usage: user at (lon=-73.98, lat=40.75) in NYC wants a ride
nearby = find_nearby_drivers(-73.98, 40.75, city='nyc', radius_km=2.0)
# Returns: [{'driver_id': 'd42', 'distance_km': 0.3, ...}, ...]| Component | Why Add It | Tradeoff |
|---|---|---|
| Redis for Geospatial Queries | Finding places within 1km of the user requires geospatial indexing. | Redis geospatial data lives in memory — suitable for hot data (restaurants near me) but not full datasets. |
| Postgres for Place Data | Proximity queries need two layers: fast 'what IDs are near me? | PostGIS queries are more powerful than Redis geo (supports polygons, road network routing) but slower for simple radius queries. |
| Cache for Geo Queries | Spatial index scans are expensive. | Cached results may miss newly added locations. |
| Load Balancer | At peak (e. | Round-robin load balancing works well here since nodes are fully stateless. |
Design decision tradeoffs
A local event (stadium opening) causes 100x normal proximity queries in a 1km radius. The geo index (geo-1) is saturated. How do you cache proximity results for popular areas and implement request coalescing to prevent redundant queries?
All queries are concentrated in one city block during rush hour, creating a hot partition on geo-1's geographic cell. How do you detect hot cells and split them into finer-grained sub-cells or route to additional replicas?
cache-1 runs out of memory and evicts all cached proximity results. Every subsequent proximity query falls through to geo-1, overwhelming it. How do you implement graceful degradation and cache capacity planning?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Redis for Geospatial Queries | Redis has native geospatial data types that store latitude/longitude pairs and support radius queries (GEORADIUS, GEOSEARCH) in O(N+log M) time. | Finding places within 1km of the user requires geospatial indexing. |
| Postgres for Place Data | Postgres with the PostGIS extension stores the full, durable place database and supports SQL queries with geographic operators for complex spatial queries. | Proximity queries need two layers: fast 'what IDs are near me? |
| Cache for Geo Queries | A Redis cache stores the results of proximity queries, keyed by geohash cell and search radius. | Spatial index scans are expensive. |
| Load Balancer | A load balancer routes geo proximity requests across multiple stateless API server instances. | At peak (e. |
Key design decisions