Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Distributed Job Scheduler
§1Step 2 — High-Level Design
Build a cron-like system that runs millions of jobs reliably. Idempotency, retry backoff, and dead-letter queues.
Interactive diagram locked
Upgrade to Pro to build and run this system.
Interactive diagram locked
Upgrade to Pro to build and run this system.
Add a second job scheduler instance to ensure no scheduled jobs are lost if one scheduler crashes.
A second job scheduler instance runs in hot-standby mode. Both schedulers compete for a distributed lock; the winner becomes leader and executes jobs, while the other watches and takes over on failure.
A single scheduler is a SPOF. If it crashes, no jobs run until it restarts — potentially delaying critical batch jobs (billing runs, report generation, data exports) for hours.
Leader election with distributed locks prevents split-brain (both schedulers running jobs simultaneously and causing double execution). The tradeoff is added complexity and a brief leader election gap (1-3 seconds) on failover.
Quartz Scheduler (Java) supports clustered mode. AWS EventBridge is inherently distributed. Celery Beat uses Redis for leader election. Kubernetes CronJob uses etcd-backed leadership.
A scheduler cluster processes 1M job triggers/day easily. Leader election overhead is < 1ms per lock refresh.
Connect Postgres to store job definitions, schedules, execution history, and status.
Postgres stores the persistent state of the job scheduler: job definitions (what to run and when), execution history (what ran, when, with what result), and current job status.
The scheduler must survive restarts. Storing job state in memory means all pending jobs are lost on crash. Postgres provides durable storage — on restart, the scheduler reads all enabled jobs and resumes scheduling.
Database-backed scheduling requires locking to prevent duplicate job execution. At job trigger time, the scheduler atomically marks the job as 'running' in Postgres before executing it.
Sidekiq Pro uses Postgres for scheduled job persistence. Airflow stores DAG runs in Postgres/MySQL. Temporal stores workflow state in Postgres/Cassandra.
1M jobs × 1KB per job definition = 1GB. 1B executions/year × 200 bytes each = 200GB execution history — partition and archive old executions to object storage.
Add Redis to store distributed locks that prevent multiple scheduler instances from running the same job simultaneously.
Redis provides the distributed locking mechanism that ensures exactly-once job execution even when multiple scheduler instances are running simultaneously.
Without locks, if two schedulers fire at 3:00 AM for the daily billing job, both execute — resulting in customers being billed twice. Redis atomic SET NX (set if not exists) guarantees only one scheduler wins the lock.
Lock TTL must exceed the job's maximum expected duration. If the job takes longer than TTL, the lock expires and another scheduler might start it. Set TTL conservatively (3-10× expected duration).
Sidekiq uses Redis for job locking. Celery uses Redis or RabbitMQ. Kubernetes uses etcd locks for leader election. All enterprise schedulers use some form of distributed locking.
Each lock is one Redis key (~100 bytes). For 10K concurrent jobs, that's 1MB of Redis state. Redis SETNX handles 100K lock operations/second.
§2Step 3 — Deep Dive
A second job scheduler instance runs in hot-standby mode. Both schedulers compete for a distributed lock; the winner becomes leader and executes jobs, while the other watches and takes over on failure.
| System | Reliability | Workflow support | Observability | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| Celery + Redis/RabbitMQ | At-least-once | No | Flower UI | Python workers, simple tasks ✓ | Medium | Medium |
| Temporal | Exactly-once | Yes (durable workflows) | Built-in UI | Long-running workflows, retries | High | High |
| Quartz Scheduler (Java) | At-least-once | No | Limited | Java ecosystem, cron jobs | Low | Medium |
| AWS Step Functions | Exactly-once | Yes (state machines) | CloudWatch | AWS-native, serverless | High | Low |
| Airflow | At-least-once | DAG-based | Full UI | Data pipelines, ETL workflows | Medium | High |
Distributed job schedulers — Celery for simplicity, Temporal for durable workflows.
import redis, time, threading, json
r = redis.Redis()
LEADER_TTL = 10
def try_become_leader(node_id: str) -> bool:
return bool(r.set("scheduler:leader", node_id, nx=True, ex=LEADER_TTL))
def is_leader(node_id: str) -> bool:
return r.get("scheduler:leader") == node_id.encode()
def heartbeat_leader(node_id: str):
while True:
if is_leader(node_id):
r.expire("scheduler:leader", LEADER_TTL)
time.sleep(5)
def enqueue_job(job_type: str, payload: dict, run_at: float):
r.zadd("jobs:pending", {json.dumps({
'type': job_type, 'payload': payload
}): run_at})
def run_scheduler_loop(node_id: str):
threading.Thread(target=heartbeat_leader, args=(node_id,), daemon=True).start()
while True:
if is_leader(node_id):
now = time.time()
jobs = r.zrangebyscore("jobs:pending", 0, now)
if jobs:
r.zremrangebyscore("jobs:pending", 0, now)
for job_raw in jobs:
r.lpush("jobs:queue", job_raw)
time.sleep(1)| Component | Why Add It | Tradeoff |
|---|---|---|
| Second Job Scheduler for HA | A single scheduler is a SPOF. | Leader election with distributed locks prevents split-brain (both schedulers running jobs simultaneously and causing double execution). |
| Postgres for Job Metadata | The scheduler must survive restarts. | Database-backed scheduling requires locking to prevent duplicate job execution. |
| Redis for Job Locking | Without locks, if two schedulers fire at 3:00 AM for the daily billing job, both execute — resulting in customers being billed twice. | Lock TTL must exceed the job's maximum expected duration. |
Design decision tradeoffs
Primary scheduler crashes. Standby must detect this and assume leadership within 30 seconds.
Scheduler-1 becomes unreachable (network partition). Redis TTL expires, standby acquires leader lock. If scheduler-1 reconnects, it must detect lost leadership and step down gracefully.
Scheduler triggers burst of jobs (e.g., 1000 concurrent cron tasks at midnight). Message queue backs up. Scheduler must not re-dispatch the same job; instead, backpressure should cause the queue to slow down job dispatching.
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Second Job Scheduler for HA | A second job scheduler instance runs in hot-standby mode. | A single scheduler is a SPOF. |
| Postgres for Job Metadata | Postgres stores the persistent state of the job scheduler: job definitions (what to run and when), execution history (what ran, when, with what result), and current job status. | The scheduler must survive restarts. |
| Redis for Job Locking | Redis provides the distributed locking mechanism that ensures exactly-once job execution even when multiple scheduler instances are running simultaneously. | Without locks, if two schedulers fire at 3:00 AM for the daily billing job, both execute — resulting in customers being billed twice. |
Key design decisions