Open on desktop

Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.

Best on desktop

Back to lesson

Distributed Job Scheduler

intermediateSchedulingReliability

Patterns·45 min read

Distributed Job Scheduler

1Understand the Problem & Establish Design Scope→2High-Level Design→3Deep Dive→4Wrap Up

SchedulingReliability

§1Step 2 — High-Level Design

2High-Level Design

Build a cron-like system that runs millions of jobs reliably. Idempotency, retry backoff, and dead-letter queues.

System architecture overview

Interactive diagram locked

Upgrade to Pro to build and run this system.

Stage 1 of 4Starting state — the problem to solve

Progressive build — add each component step by step

Interactive diagram locked

Upgrade to Pro to build and run this system.

Add a Second Job Scheduler for HA

Add a second job scheduler instance to ensure no scheduled jobs are lost if one scheduler crashes.

What it does

A second job scheduler instance runs in hot-standby mode. Both schedulers compete for a distributed lock; the winner becomes leader and executes jobs, while the other watches and takes over on failure.

Why it matters

A single scheduler is a SPOF. If it crashes, no jobs run until it restarts — potentially delaying critical batch jobs (billing runs, report generation, data exports) for hours.

Trade-off

Leader election with distributed locks prevents split-brain (both schedulers running jobs simultaneously and causing double execution). The tradeoff is added complexity and a brief leader election gap (1-3 seconds) on failover.

Real world

Quartz Scheduler (Java) supports clustered mode. AWS EventBridge is inherently distributed. Celery Beat uses Redis for leader election. Kubernetes CronJob uses etcd-backed leadership.

Capacity math

A scheduler cluster processes 1M job triggers/day easily. Leader election overhead is < 1ms per lock refresh.

In the real world: Quartz Scheduler (Java) supports clustered mode. AWS EventBridge is inherently distributed. Celery Beat uses Redis for leader election. Kubernetes CronJob uses etcd-backed leadership.

Add Postgres for Job Metadata

Connect Postgres to store job definitions, schedules, execution history, and status.

What it does

Postgres stores the persistent state of the job scheduler: job definitions (what to run and when), execution history (what ran, when, with what result), and current job status.

Why it matters

The scheduler must survive restarts. Storing job state in memory means all pending jobs are lost on crash. Postgres provides durable storage — on restart, the scheduler reads all enabled jobs and resumes scheduling.

Trade-off

Database-backed scheduling requires locking to prevent duplicate job execution. At job trigger time, the scheduler atomically marks the job as 'running' in Postgres before executing it.

Real world

Sidekiq Pro uses Postgres for scheduled job persistence. Airflow stores DAG runs in Postgres/MySQL. Temporal stores workflow state in Postgres/Cassandra.

Capacity math

1M jobs × 1KB per job definition = 1GB. 1B executions/year × 200 bytes each = 200GB execution history — partition and archive old executions to object storage.

In the real world: Sidekiq Pro uses Postgres for scheduled job persistence. Airflow stores DAG runs in Postgres/MySQL. Temporal stores workflow state in Postgres/Cassandra.

Add Redis for Job Locking

Add Redis to store distributed locks that prevent multiple scheduler instances from running the same job simultaneously.

What it does

Redis provides the distributed locking mechanism that ensures exactly-once job execution even when multiple scheduler instances are running simultaneously.

Why it matters

Without locks, if two schedulers fire at 3:00 AM for the daily billing job, both execute — resulting in customers being billed twice. Redis atomic SET NX (set if not exists) guarantees only one scheduler wins the lock.

Trade-off

Lock TTL must exceed the job's maximum expected duration. If the job takes longer than TTL, the lock expires and another scheduler might start it. Set TTL conservatively (3-10× expected duration).

Real world

Sidekiq uses Redis for job locking. Celery uses Redis or RabbitMQ. Kubernetes uses etcd locks for leader election. All enterprise schedulers use some form of distributed locking.

Capacity math

Each lock is one Redis key (~100 bytes). For 10K concurrent jobs, that's 1MB of Redis state. Redis SETNX handles 100K lock operations/second.

In the real world: Sidekiq uses Redis for job locking. Celery uses Redis or RabbitMQ. Kubernetes uses etcd locks for leader election. All enterprise schedulers use some form of distributed locking.

Primary Scheduler Down: Primary scheduler crashes. Standby must detect this and assume leadership within 30 seconds.

§2Step 3 — Deep Dive

3Deep Dive

System	Reliability	Workflow support	Observability	Best for	Cost	Ops burden
Celery + Redis/RabbitMQ	At-least-once	No	Flower UI	Python workers, simple tasks ✓	Medium	Medium
Temporal	Exactly-once	Yes (durable workflows)	Built-in UI	Long-running workflows, retries	High	High
Quartz Scheduler (Java)	At-least-once	No	Limited	Java ecosystem, cron jobs	Low	Medium
AWS Step Functions	Exactly-once	Yes (state machines)	CloudWatch	AWS-native, serverless	High	Low
Airflow	At-least-once	DAG-based	Full UI	Data pipelines, ETL workflows	Medium	High

Distributed job schedulers — Celery for simplicity, Temporal for durable workflows.

pythonJob scheduler — leader election + distributed cron with Redis sorted set

import redis, time, threading, json
r = redis.Redis()
LEADER_TTL = 10

def try_become_leader(node_id: str) -> bool:
    return bool(r.set("scheduler:leader", node_id, nx=True, ex=LEADER_TTL))

def is_leader(node_id: str) -> bool:
    return r.get("scheduler:leader") == node_id.encode()

def heartbeat_leader(node_id: str):
    while True:
        if is_leader(node_id):
            r.expire("scheduler:leader", LEADER_TTL)
        time.sleep(5)

def enqueue_job(job_type: str, payload: dict, run_at: float):
    r.zadd("jobs:pending", {json.dumps({
        'type': job_type, 'payload': payload
    }): run_at})

def run_scheduler_loop(node_id: str):
    threading.Thread(target=heartbeat_leader, args=(node_id,), daemon=True).start()
    while True:
        if is_leader(node_id):
            now = time.time()
            jobs = r.zrangebyscore("jobs:pending", 0, now)
            if jobs:
                r.zremrangebyscore("jobs:pending", 0, now)
                for job_raw in jobs:
                    r.lpush("jobs:queue", job_raw)
        time.sleep(1)

Component	Why Add It	Tradeoff
Second Job Scheduler for HA	A single scheduler is a SPOF.	Leader election with distributed locks prevents split-brain (both schedulers running jobs simultaneously and causing double execution).
Postgres for Job Metadata	The scheduler must survive restarts.	Database-backed scheduling requires locking to prevent duplicate job execution.
Redis for Job Locking	Without locks, if two schedulers fire at 3:00 AM for the daily billing job, both execute — resulting in customers being billed twice.	Lock TTL must exceed the job's maximum expected duration.

Design decision tradeoffs

Primary Scheduler Down

Primary scheduler crashes. Standby must detect this and assume leadership within 30 seconds.

Scheduler Network Isolation

Scheduler-1 becomes unreachable (network partition). Redis TTL expires, standby acquires leader lock. If scheduler-1 reconnects, it must detect lost leadership and step down gracefully.

Job Queue Saturation

Scheduler triggers burst of jobs (e.g., 1000 concurrent cron tasks at midnight). Message queue backs up. Scheduler must not re-dispatch the same job; instead, backpressure should cause the queue to slow down job dispatching.

Two schedulers running simultaneously would dispatch the same job twice. Use Redis leader election: each scheduler tries SETNX scheduler:leader {nodeID} with TTL=30s. Only one succeeds — that's the leader.

Leader renews its TTL every 10s (heartbeat). If leader fails, its key expires after 30s. Standby detects the missing key and acquires leadership via SETNX.

Idempotent job execution: each job run has a unique executionID = CRC(jobID + scheduledTime). Before executing, INSERT INTO job_executions (executionID) with UNIQUE constraint. If the INSERT fails (key exists), another node already ran it — skip.

§3Step 4 — Wrap Up

4Wrap Up

Decision	Choice	Why
Second Job Scheduler for HA	A second job scheduler instance runs in hot-standby mode.	A single scheduler is a SPOF.
Postgres for Job Metadata	Postgres stores the persistent state of the job scheduler: job definitions (what to run and when), execution history (what ran, when, with what result), and current job status.	The scheduler must survive restarts.
Redis for Job Locking	Redis provides the distributed locking mechanism that ensures exactly-once job execution even when multiple scheduler instances are running simultaneously.	Without locks, if two schedulers fire at 3:00 AM for the daily billing job, both execute — resulting in customers being billed twice.

Key design decisions

If the interviewer asks to scale 10×: 10x the load — architectural moves that work. Identify the single bottleneck (usually the database write path) and address it first before horizontal scaling.

10× Target1K RPSwhere your architecture must hold

What's next