Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Notification System
§1Step 2 — High-Level Design
Build a push notification system handling 10M daily active users across iOS, Android, and web.
Place Kafka or SQS between the API server and notification workers to buffer notification requests.
A message queue decouples notification producers (the API) from consumers (workers) by buffering messages durably until they can be processed.
Sending notifications (email via SendGrid, SMS via Twilio, push via FCM) involves slow external API calls taking 100-500ms each. Without a queue, the API would block on every notification, limiting throughput. The queue lets the API respond instantly while workers handle delivery at their own pace.
Notifications become eventually consistent — there's a small delay between request and delivery. But this is acceptable for notifications. Dead-letter queues handle delivery failures.
Uber's notification system processes 100M+ notifications/day through Kafka. Duolingo sends 100M push notifications/day via queued workers.
Kafka handles 1M+ messages/second. A single SQS queue supports 3,000 messages/second standard, 300/second FIFO.
Add worker services that consume from the message queue and send notifications via email, SMS, and push APIs.
Worker services are background processes that consume messages from the queue and execute the slow work — calling external notification APIs.
Workers scale independently from the API. If notifications queue up (e.g., a marketing blast), you can spin up 50 workers to process the backlog without changing the API tier.
Workers must handle idempotency — if a worker crashes mid-delivery, the message gets requeued and re-processed. The worker must check if the notification was already sent before re-sending.
Airbnb's notification workers are Sidekiq jobs in Ruby. Facebook's push notification workers process billions of messages/day. Twilio's internal worker fleet delivers SMS globally.
One worker can send ~10 notifications/second (limited by external API rate limits). For 1M notifications/hour, you need ~30 workers running continuously.
At high traffic, buffer notification delivery through a message queue so the delivery layer isn't overwhelmed by spikes.
A message queue (Kafka or SQS) buffers notification events between the notification API and the delivery workers.
At high traffic, notification spikes (e.g., breaking news, product launches) can overwhelm delivery workers. A queue absorbs the burst.
Queued notifications have delivery latency (seconds vs milliseconds). For time-sensitive alerts, use a high-priority fast lane.
Facebook uses Iris (Kafka-based) for notification queuing. Airbnb uses SQS for async notification delivery.
Kafka handles 1M+ events/second. A single topic can buffer millions of pending notifications.
At peak, serve notification media assets (images, thumbnails) from a CDN so delivery workers don't act as media proxies.
A CDN hosts media assets referenced in rich notifications, serving them from edge PoPs near the user.
At peak, millions of notification images being fetched from origin would saturate your bandwidth. CDN edges absorb this load.
CDN cache invalidation is needed when notification images change. Use content-addressed URLs (hash in filename) for immutable caching.
WhatsApp and Instagram use Fastly CDN for media in notifications. Asset URLs contain content hashes for long cache TTLs.
A CDN like Fastly serves terabytes per second globally. Image delivery is never the bottleneck.
§2Step 3 — Deep Dive
A message queue decouples notification producers (the API) from consumers (workers) by buffering messages durably until they can be processed.
| Channel | Latency | Delivery rate | Cost | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| Push (APNs/FCM) | <1s | ~90% (device online) | Free | Mobile apps, real-time alerts ✓ | Low | Medium |
| Email (SES/SendGrid) | 1–30s | ~95% inbox rate | $0.0001/email | Receipts, newsletters, digests | Low | Low |
| SMS (Twilio) | 1–5s | ~99% | $0.0075/SMS | OTP, critical alerts | High | Low |
| In-app (WebSocket) | <100ms | 100% (if connected) | Free | Chat, live updates | Medium | Medium |
| Webhook | <500ms | Depends on consumer | Free | B2B, developer integrations | Low | Low |
Notification delivery channels — pick based on urgency and user preference.
from kafka import KafkaProducer, KafkaConsumer
import json
producer = KafkaProducer(bootstrap_servers=['kafka:9092'],
value_serializer=lambda v: json.dumps(v).encode())
def send_notification(user_id: str, event_type: str, data: dict):
# Publish once — channel workers consume independently
producer.send('notifications', {
'user_id': user_id,
'event_type': event_type,
'data': data,
'channels': get_user_preferences(user_id), # ['push', 'email']
})
producer.flush() # <1ms — async, non-blocking for caller
# Each channel has its own consumer group — fully independent
class PushWorker:
def run(self):
consumer = KafkaConsumer('notifications',
group_id='push-workers',
bootstrap_servers=['kafka:9092'])
for msg in consumer:
payload = json.loads(msg.value)
if 'push' in payload['channels']:
self.send_apns(payload['user_id'], payload['data'])
class EmailWorker:
def run(self):
consumer = KafkaConsumer('notifications',
group_id='email-workers',
bootstrap_servers=['kafka:9092'])
for msg in consumer:
payload = json.loads(msg.value)
if 'email' in payload['channels']:
self.send_ses(payload['user_id'], payload['data'])| Component | Why Add It | Tradeoff |
|---|---|---|
| Message Queue | Sending notifications (email via SendGrid, SMS via Twilio, push via FCM) involves slow external API calls taking 100-500ms each. | Notifications become eventually consistent — there's a small delay between request and delivery. |
| Worker Services | Workers scale independently from the API. | Workers must handle idempotency — if a worker crashes mid-delivery, the message gets requeued and re-processed. |
| Message Queue | At high traffic, notification spikes (e. | Queued notifications have delivery latency (seconds vs milliseconds). |
| CDN for Rich Notifications | At peak, millions of notification images being fetched from origin would saturate your bandwidth. | CDN cache invalidation is needed when notification images change. |
Design decision tradeoffs
A marketing campaign triggers 10M notifications in 60 seconds. Worker nodes (worker-1) queue fills, latency spikes to 5 minutes for delivery. How do you handle burst buffering, priority lanes, and back-pressure on the MQ to degrade gracefully?
worker-1 crashes with 100K in-flight notification deliveries. If messages aren't ACKed, they must be re-queued. How do you implement dead-letter queues, retry policies, and deduplication to avoid duplicate notifications?
A single event (celebrity goes live) triggers push notifications to 50M followers simultaneously. Third-party push services (FCM, APNs) rate-limit the delivery. How do you batch, stagger, and prioritize notifications to respect rate limits?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Message Queue | A message queue decouples notification producers (the API) from consumers (workers) by buffering messages durably until they can be processed. | Sending notifications (email via SendGrid, SMS via Twilio, push via FCM) involves slow external API calls taking 100-500ms each. |
| Worker Services | Worker services are background processes that consume messages from the queue and execute the slow work — calling external notification APIs. | Workers scale independently from the API. |
| Message Queue | A message queue (Kafka or SQS) buffers notification events between the notification API and the delivery workers. | At high traffic, notification spikes (e. |
| CDN for Rich Notifications | A CDN hosts media assets referenced in rich notifications, serving them from edge PoPs near the user. | At peak, millions of notification images being fetched from origin would saturate your bandwidth. |
Key design decisions