Open on desktop
Antimetal's interactive diagrams require a larger screen. Open this page on your laptop or desktop to continue.
Payment Gateway
§1Step 2 — High-Level Design
Build a payment processor with idempotency, exactly-once semantics, and fraud detection.
Connect Postgres to durably store payment transactions with ACID guarantees.
Postgres stores the authoritative payment records: transaction ID, amount, currency, status (pending/completed/failed), payer, recipient, and timestamp.
Payments require ACID guarantees. If a charge succeeds at the payment processor but the API crashes before writing to the database, the payment is lost. Postgres transactions ensure both happen atomically or neither does.
Postgres is the write bottleneck for high-volume payments. At 10K transactions/second, Postgres write throughput is the limit. Shard by merchant ID or use Citus for horizontal scaling.
Stripe uses Postgres as their primary payment database. PayPal uses Oracle (similar ACID guarantees). Square uses Postgres. Robinhood uses Postgres for financial records.
Payments table: 10K TPS × 86,400 seconds × 500 bytes = 432GB/day. Partition by date; archive old data to Redshift. Postgres handles 10K TPS with proper indexing on a 16-core instance.
Add Redis to store idempotency keys that prevent duplicate payment processing on network retries.
Redis stores idempotency keys — unique client-generated IDs that ensure a payment request is processed exactly once even if the client retries due to a network timeout.
In payments, network timeouts are dangerous: was the charge processed? The client doesn't know. Without idempotency, retrying causes a double charge. Redis stores the result of the first processing attempt — retries return the cached result instantly.
Idempotency keys expire (typically 24 hours). Retries after expiry are treated as new requests. For long-running payment disputes, a separate deduplication table in Postgres is needed.
Stripe's API requires idempotency keys for all mutating requests. Adyen uses idempotency keys for payment retries. All major payment APIs implement this pattern.
1M payments/day × 200 bytes per idempotency record × 24 hour TTL = 200MB peak Redis memory. Trivially fits on any Redis instance.
Add Kafka to propagate payment events to downstream systems: ledger, notifications, fraud detection.
Kafka carries payment lifecycle events (PaymentInitiated, PaymentCompleted, PaymentFailed, RefundRequested) to all downstream consumers.
After processing a payment, the system must update the ledger, send a receipt email, trigger fraud analysis, and notify the merchant — all slow operations. Publishing to Kafka lets the payment API complete instantly while these happen asynchronously.
Downstream systems are eventually consistent with the payment API. The ledger might lag by seconds. For financial reporting, ensure the ledger consumer has exactly-once semantics (Kafka transactions or idempotent writes).
Stripe uses internal Kafka for payment event propagation. Square's event bus carries payment events. Adyen uses an event-driven architecture for payment lifecycle management.
10K payments/second × 2KB per event = 20MB/second Kafka throughput. With 3x replication, 60MB/second. Well within a single 3-broker Kafka cluster capacity.
§2Step 3 — Deep Dive
Postgres stores the authoritative payment records: transaction ID, amount, currency, status (pending/completed/failed), payer, recipient, and timestamp.
| Strategy | Prevents double-charge? | Complexity | Auditability | Best for | Cost | Ops burden |
|---|---|---|---|---|---|---|
| Idempotency key (Redis SETNX) | Yes (client-provided key) | Low | No | API-level dedup for retries ✓ | Medium | Low |
| Double-entry ledger | Yes (balance constraints) | Medium | Full audit trail | Financial systems, compliance | Low | Low |
| Distributed 2PC | Yes (across DBs) | High | Yes | Multi-bank transactions | Medium | High |
| Saga pattern | Yes (compensating txns) | High | Yes (event log) | Microservices, long-running flows | Medium | High |
| Optimistic locking (version) | Yes (retry on conflict) | Low | Partial | Low-contention payment flows | Low | Low |
Payment consistency strategies — idempotency keys + double-entry are the foundation.
import redis, psycopg2, uuid, json
r = redis.Redis()
def process_payment(idempotency_key: str, from_account: str,
to_account: str, amount_cents: int) -> dict:
lock_key = f"payment:idem:{idempotency_key}"
if not r.set(lock_key, "processing", nx=True, ex=86400):
result = r.get(f"payment:result:{idempotency_key}")
return json.loads(result) if result else {"status": "processing"}
conn = psycopg2.connect(DATABASE_URL)
try:
with conn:
cur = conn.cursor()
cur.execute("SELECT balance FROM accounts WHERE id = %s FOR UPDATE",
(from_account,))
balance = cur.fetchone()[0]
if balance < amount_cents:
raise ValueError("Insufficient funds")
txn_id = str(uuid.uuid4())
cur.execute(
"INSERT INTO ledger (txn_id, account_id, amount, type) VALUES (%s,%s,%s,'debit')",
(txn_id, from_account, -amount_cents))
cur.execute(
"INSERT INTO ledger (txn_id, account_id, amount, type) VALUES (%s,%s,%s,'credit')",
(txn_id, to_account, amount_cents))
result = {"status": "success", "txn_id": txn_id}
r.set(f"payment:result:{idempotency_key}", json.dumps(result), ex=86400)
return result
except Exception as e:
r.delete(lock_key)
raise| Component | Why Add It | Tradeoff |
|---|---|---|
| Postgres for Payment Records | Payments require ACID guarantees. | Postgres is the write bottleneck for high-volume payments. |
| Redis for Idempotency Keys | In payments, network timeouts are dangerous: was the charge processed? | Idempotency keys expire (typically 24 hours). |
| Message Queue for Payment Events | After processing a payment, the system must update the ledger, send a receipt email, trigger fraud analysis, and notify the merchant — all slow operations. | Downstream systems are eventually consistent with the payment API. |
Design decision tradeoffs
api-1 crashes after charging the customer but before writing to the database. The client retries and is charged twice. How do you implement idempotency keys: api-1 stores charge_id → result in Redis before returning, so retries return the cached result?
The network between api-1 and the external payment processor goes down after the charge succeeds remotely but before the response arrives. The client gets a timeout and retries. How do idempotency keys prevent double-charging and reconciliation jobs detect discrepancies?
A flash sale triggers 10K concurrent payment attempts. The Postgres ledger database becomes a write bottleneck — each payment requires an atomic debit+credit. How do you use the message queue to buffer writes, apply optimistic locking, and partition the ledger by user ID?
§3Step 4 — Wrap Up
| Decision | Choice | Why |
|---|---|---|
| Postgres for Payment Records | Postgres stores the authoritative payment records: transaction ID, amount, currency, status (pending/completed/failed), payer, recipient, and timestamp. | Payments require ACID guarantees. |
| Redis for Idempotency Keys | Redis stores idempotency keys — unique client-generated IDs that ensure a payment request is processed exactly once even if the client retries due to a network timeout. | In payments, network timeouts are dangerous: was the charge processed? |
| Message Queue for Payment Events | Kafka carries payment lifecycle events (PaymentInitiated, PaymentCompleted, PaymentFailed, RefundRequested) to all downstream consumers. | After processing a payment, the system must update the ledger, send a receipt email, trigger fraud analysis, and notify the merchant — all slow operations. |
Key design decisions