Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Event-Driven Architecture

beginnerMessagingPatterns

Messaging·30 min read

Event-Driven Architecture

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

MessagingPatterns

§1Step 2 — High-Level Design

2High-Level Design

Replace synchronous calls with events. Design event schemas, ordering guarantees, and consumer groups.

System architecture overview

Stage 1 of 4Starting state — the problem to solve

Progressive build — add each component step by step

Add an Event Bus

Place an Event Bus between the order service and downstream consumers (inventory, email, analytics) to decouple them.

What it does

An Event Bus is a publish-subscribe messaging system where producers emit events and any number of consumers receive them independently.

Why it matters

Without an event bus, order-svc must call inventory-svc, email-svc, and analytics-svc synchronously. If any downstream service is slow or down, order processing fails. The event bus decouples producers from consumers — order-svc doesn't know or care who's listening.

Trade-off

Event-driven systems are harder to trace and debug (no synchronous call stack). You need event schema versioning and dead-letter queues for failed processing.

Real world

Shopify processes order events through Kafka. Airbnb uses event-driven architecture for booking confirmations. Amazon's entire microservices ecosystem is event-driven via SNS/SQS.

Capacity math

Kafka handles 1M+ events/second per cluster. Each service gets its own consumer group, reading independently at its own pace.

In the real world: Shopify processes order events through Kafka. Airbnb uses event-driven architecture for booking confirmations. Amazon's entire microservices ecosystem is event-driven via SNS/SQS.

Add a Message Queue

At high traffic, add a durable message queue between producers and consumers to decouple them and handle backpressure.

What it does

A message queue stores events durably and delivers them to consumers at a rate they can handle.

Why it matters

Without a queue, a traffic spike overwhelms consumers. The queue provides backpressure, buffering, and at-least-once delivery guarantees.

Trade-off

Messages may be processed out of order. Partition by key (user_id, entity_id) to guarantee per-entity ordering.

Real world

Kafka powers event pipelines at LinkedIn, Airbnb, and Uber. SQS is the AWS equivalent for simpler use cases.

Capacity math

Kafka handles 1M+ events/second per broker. A 3-broker cluster easily handles 10M events/second.

In the real world: Kafka powers event pipelines at LinkedIn, Airbnb, and Uber. SQS is the AWS equivalent for simpler use cases.

Add a Load Balancer for Consumers

At peak, scale consumer workers horizontally behind a load balancer to process the event backlog faster.

What it does

A load balancer (or Kafka consumer group) distributes queue partitions across multiple consumer worker instances.

Why it matters

At peak, a single consumer can't process events fast enough. Adding workers scales throughput linearly up to the partition count.

Trade-off

Consumer lag may grow during sudden spikes. Monitor consumer lag and auto-scale based on lag metrics.

Real world

Netflix Flink and Spark Streaming auto-scale consumers based on Kafka lag. AWS Lambda can scale to thousands of concurrent consumers.

Capacity math

Each consumer partition processes ~50K events/second. 20 partitions = 1M events/second consumer capacity.

In the real world: Netflix Flink and Spark Streaming auto-scale consumers based on Kafka lag. AWS Lambda can scale to thousands of concurrent consumers.

Inventory Service Crash: inventory-svc crashes after receiving an 'order.created' event but before publishing 'inventory.reserved'. The event is lost. Orders are created but inventory never reserved. How do you ensure at-least-once delivery and idempotency?

§2Step 3 — Deep Dive

3Deep Dive

An Event Bus is a publish-subscribe messaging system where producers emit events and any number of consumers receive them independently.

Pattern	Coupling	Message replay?	Fan-out	Best for	Cost	Ops burden
Direct call (sync)	Tight	No	No	Simple 2-service flows	Low	Low
Task queue (RabbitMQ)	Loose	No (acked = gone)	Partial	Job queues, work distribution	Medium	Medium
Event stream (Kafka)	Loose	Yes (log retention)	Yes (consumer groups)	Event sourcing, audit log ✓	High	High
Pub/Sub (Redis)	Loose	No (fire-and-forget)	Yes	Real-time notifications, low volume	Medium	Medium
Outbox pattern	Loose	Yes	Yes	At-least-once with DB consistency	Medium	Medium

Messaging patterns — Kafka for event streaming, RabbitMQ for task queues.

pythonOutbox pattern — transactional event publishing with Postgres + Kafka

# Outbox pattern: write event to DB in same transaction as business logic.
# A poller relays outbox rows to Kafka. Guarantees at-least-once delivery.

def place_order(order_data: dict, db_conn):
    with db_conn.transaction():
        # 1. Write business data
        order_id = db_conn.execute(
            "INSERT INTO orders (user_id, total) VALUES (%s, %s) RETURNING id",
            order_data['user_id'], order_data['total']
        ).scalar()

        # 2. Write event to outbox IN THE SAME TRANSACTION
        db_conn.execute(
            """INSERT INTO outbox (aggregate_id, event_type, payload, status)
               VALUES (%s, 'OrderPlaced', %s, 'pending')""",
            order_id, json.dumps({'order_id': order_id, **order_data})
        )
    # Transaction commits atomically — both rows or neither

# Outbox relay runs every 100ms
def relay_outbox(db_conn, kafka_producer):
    rows = db_conn.execute(
        "SELECT * FROM outbox WHERE status = 'pending' LIMIT 100"
    ).fetchall()
    for row in rows:
        kafka_producer.produce('orders', key=row.aggregate_id, value=row.payload)
        db_conn.execute("UPDATE outbox SET status='sent' WHERE id=%s", row.id)

Component	Why Add It	Tradeoff
Event Bus	Without an event bus, order-svc must call inventory-svc, email-svc, and analytics-svc synchronously.	Event-driven systems are harder to trace and debug (no synchronous call stack).
Message Queue	Without a queue, a traffic spike overwhelms consumers.	Messages may be processed out of order.
Load Balancer for Consumers	At peak, a single consumer can't process events fast enough.	Consumer lag may grow during sudden spikes.

Design decision tradeoffs

Inventory Service Crash

inventory-svc crashes after receiving an 'order.created' event but before publishing 'inventory.reserved'. The event is lost. Orders are created but inventory never reserved. How do you ensure at-least-once delivery and idempotency?

Event Storm Fan-Out

A single high-volume event triggers cascading fan-out: one 'order.created' event triggers inventory, email, analytics, shipping — all simultaneously. Downstream services are overwhelmed. How do you apply back-pressure or prioritize consumers?

Message Broker Partition

The message broker loses quorum: some brokers can receive events but not replicate them. Producers think writes succeeded; consumers don't see them. How do you detect and recover from split-brain message loss?

The current architecture is tightly coupled — Order Service directly calls all downstream services. If Email Service is slow, it slows Order Service. Add an Event Bus node.

Order Service should publish one event: "OrderPlaced". Each downstream service subscribes to this event independently. Order Service does not know or care what happens after publishing.

Remove the direct connections from Order Service to downstream services. Connect: Order Service → Event Bus. Event Bus → Inventory Service, Email Service, Analytics Service. Order Service is now decoupled from its consumers.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Event Bus	An Event Bus is a publish-subscribe messaging system where producers emit events and any number of consumers receive them independently.	Without an event bus, order-svc must call inventory-svc, email-svc, and analytics-svc synchronously.
Message Queue	A message queue stores events durably and delivers them to consumers at a rate they can handle.	Without a queue, a traffic spike overwhelms consumers.
Load Balancer for Consumers	A load balancer (or Kafka consumer group) distributes queue partitions across multiple consumer worker instances.	At peak, a single consumer can't process events fast enough.

Key design decisions

If the interviewer asks to scale 10×: Fan out without falling over. Partition your message stream by a key that balances consumer parallelism (topic + user shard, not random).

10× Target20K RPSwhere your architecture must hold

What's next