Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Notification System

beginnerMessagingMobile

Messaging·35 min read

Notification System

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

MessagingMobile

§1Step 2 — High-Level Design

2High-Level Design

Build a push notification system handling 10M daily active users across iOS, Android, and web.

System architecture overview

Stage 1 of 5Starting state — the problem to solve

Progressive build — add each component step by step

Add a Message Queue

Place Kafka or SQS between the API server and notification workers to buffer notification requests.

What it does

A message queue decouples notification producers (the API) from consumers (workers) by buffering messages durably until they can be processed.

Why it matters

Sending notifications (email via SendGrid, SMS via Twilio, push via FCM) involves slow external API calls taking 100-500ms each. Without a queue, the API would block on every notification, limiting throughput. The queue lets the API respond instantly while workers handle delivery at their own pace.

Trade-off

Notifications become eventually consistent — there's a small delay between request and delivery. But this is acceptable for notifications. Dead-letter queues handle delivery failures.

Real world

Uber's notification system processes 100M+ notifications/day through Kafka. Duolingo sends 100M push notifications/day via queued workers.

Capacity math

Kafka handles 1M+ messages/second. A single SQS queue supports 3,000 messages/second standard, 300/second FIFO.

In the real world: Uber's notification system processes 100M+ notifications/day through Kafka. Duolingo sends 100M push notifications/day via queued workers.

Add Worker Services

Add worker services that consume from the message queue and send notifications via email, SMS, and push APIs.

What it does

Worker services are background processes that consume messages from the queue and execute the slow work — calling external notification APIs.

Why it matters

Workers scale independently from the API. If notifications queue up (e.g., a marketing blast), you can spin up 50 workers to process the backlog without changing the API tier.

Trade-off

Workers must handle idempotency — if a worker crashes mid-delivery, the message gets requeued and re-processed. The worker must check if the notification was already sent before re-sending.

Real world

Airbnb's notification workers are Sidekiq jobs in Ruby. Facebook's push notification workers process billions of messages/day. Twilio's internal worker fleet delivers SMS globally.

Capacity math

One worker can send ~10 notifications/second (limited by external API rate limits). For 1M notifications/hour, you need ~30 workers running continuously.

In the real world: Airbnb's notification workers are Sidekiq jobs in Ruby. Facebook's push notification workers process billions of messages/day. Twilio's internal worker fleet delivers SMS globally.

Add a Message Queue

At high traffic, buffer notification delivery through a message queue so the delivery layer isn't overwhelmed by spikes.

What it does

A message queue (Kafka or SQS) buffers notification events between the notification API and the delivery workers.

Why it matters

At high traffic, notification spikes (e.g., breaking news, product launches) can overwhelm delivery workers. A queue absorbs the burst.

Trade-off

Queued notifications have delivery latency (seconds vs milliseconds). For time-sensitive alerts, use a high-priority fast lane.

Real world

Facebook uses Iris (Kafka-based) for notification queuing. Airbnb uses SQS for async notification delivery.

Capacity math

Kafka handles 1M+ events/second. A single topic can buffer millions of pending notifications.

In the real world: Facebook uses Iris (Kafka-based) for notification queuing. Airbnb uses SQS for async notification delivery.

Add a CDN for Rich Notifications

At peak, serve notification media assets (images, thumbnails) from a CDN so delivery workers don't act as media proxies.

What it does

A CDN hosts media assets referenced in rich notifications, serving them from edge PoPs near the user.

Why it matters

At peak, millions of notification images being fetched from origin would saturate your bandwidth. CDN edges absorb this load.

Trade-off

CDN cache invalidation is needed when notification images change. Use content-addressed URLs (hash in filename) for immutable caching.

Real world

WhatsApp and Instagram use Fastly CDN for media in notifications. Asset URLs contain content hashes for long cache TTLs.

Capacity math

A CDN like Fastly serves terabytes per second globally. Image delivery is never the bottleneck.

In the real world: WhatsApp and Instagram use Fastly CDN for media in notifications. Asset URLs contain content hashes for long cache TTLs.

Notification Burst: A marketing campaign triggers 10M notifications in 60 seconds. Worker nodes (worker-1) queue fills, latency spikes to 5 minutes for delivery. How do you handle burst buffering, priority lanes, and back-pressure on the MQ to degrade gracefully?

§2Step 3 — Deep Dive

3Deep Dive

A message queue decouples notification producers (the API) from consumers (workers) by buffering messages durably until they can be processed.

Channel	Latency	Delivery rate	Cost	Best for	Cost	Ops burden
Push (APNs/FCM)	<1s	~90% (device online)	Free	Mobile apps, real-time alerts ✓	Low	Medium
Email (SES/SendGrid)	1–30s	~95% inbox rate	$0.0001/email	Receipts, newsletters, digests	Low	Low
SMS (Twilio)	1–5s	~99%	$0.0075/SMS	OTP, critical alerts	High	Low
In-app (WebSocket)	<100ms	100% (if connected)	Free	Chat, live updates	Medium	Medium
Webhook	<500ms	Depends on consumer	Free	B2B, developer integrations	Low	Low

Notification delivery channels — pick based on urgency and user preference.

pythonAsync notification pipeline — Kafka fan-out to channel workers

from kafka import KafkaProducer, KafkaConsumer
import json

producer = KafkaProducer(bootstrap_servers=['kafka:9092'],
                         value_serializer=lambda v: json.dumps(v).encode())

def send_notification(user_id: str, event_type: str, data: dict):
    # Publish once — channel workers consume independently
    producer.send('notifications', {
        'user_id': user_id,
        'event_type': event_type,
        'data': data,
        'channels': get_user_preferences(user_id),  # ['push', 'email']
    })
    producer.flush()  # <1ms — async, non-blocking for caller

# Each channel has its own consumer group — fully independent
class PushWorker:
    def run(self):
        consumer = KafkaConsumer('notifications',
                                 group_id='push-workers',
                                 bootstrap_servers=['kafka:9092'])
        for msg in consumer:
            payload = json.loads(msg.value)
            if 'push' in payload['channels']:
                self.send_apns(payload['user_id'], payload['data'])

class EmailWorker:
    def run(self):
        consumer = KafkaConsumer('notifications',
                                 group_id='email-workers',
                                 bootstrap_servers=['kafka:9092'])
        for msg in consumer:
            payload = json.loads(msg.value)
            if 'email' in payload['channels']:
                self.send_ses(payload['user_id'], payload['data'])

Component	Why Add It	Tradeoff
Message Queue	Sending notifications (email via SendGrid, SMS via Twilio, push via FCM) involves slow external API calls taking 100-500ms each.	Notifications become eventually consistent — there's a small delay between request and delivery.
Worker Services	Workers scale independently from the API.	Workers must handle idempotency — if a worker crashes mid-delivery, the message gets requeued and re-processed.
Message Queue	At high traffic, notification spikes (e.	Queued notifications have delivery latency (seconds vs milliseconds).
CDN for Rich Notifications	At peak, millions of notification images being fetched from origin would saturate your bandwidth.	CDN cache invalidation is needed when notification images change.

Design decision tradeoffs

Notification Burst

A marketing campaign triggers 10M notifications in 60 seconds. Worker nodes (worker-1) queue fills, latency spikes to 5 minutes for delivery. How do you handle burst buffering, priority lanes, and back-pressure on the MQ to degrade gracefully?

Notification Worker Crash

worker-1 crashes with 100K in-flight notification deliveries. If messages aren't ACKed, they must be re-queued. How do you implement dead-letter queues, retry policies, and deduplication to avoid duplicate notifications?

Celebrity Push Fan-Out

A single event (celebrity goes live) triggers push notifications to 50M followers simultaneously. Third-party push services (FCM, APNs) rate-limit the delivery. How do you batch, stagger, and prioritize notifications to respect rate limits?

Never deliver notifications synchronously in the request path — it makes the user's request wait for all notifications to send. Add a Message Queue node.

The API server publishes a "notification event" to the queue and immediately returns to the user. Add a Worker Service that consumes from the queue and delivers notifications.

Connect: API Server → Message Queue → Worker Service. The worker service handles retries, rate limiting to providers (APNs, FCM, SES), and delivery tracking.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Message Queue	A message queue decouples notification producers (the API) from consumers (workers) by buffering messages durably until they can be processed.	Sending notifications (email via SendGrid, SMS via Twilio, push via FCM) involves slow external API calls taking 100-500ms each.
Worker Services	Worker services are background processes that consume messages from the queue and execute the slow work — calling external notification APIs.	Workers scale independently from the API.
Message Queue	A message queue (Kafka or SQS) buffers notification events between the notification API and the delivery workers.	At high traffic, notification spikes (e.
CDN for Rich Notifications	A CDN hosts media assets referenced in rich notifications, serving them from edge PoPs near the user.	At peak, millions of notification images being fetched from origin would saturate your bandwidth.

Key design decisions

If the interviewer asks to scale 10×: Fan out without falling over. Partition your message stream by a key that balances consumer parallelism (topic + user shard, not random).

10× Target10K RPSwhere your architecture must hold

What's next

Messaging

Event-Driven Architecture

30 min read

Realtime

Real-Time Chat System

50 min read