Message Queues & Event-Driven Architecture
A message queue is a buffer that decouples the producer of a message from its consumer. Instead of Service A calling Service B directly and waiting for a response, A drops a message into a queue and continues working. B processes it at its own pace.
This simple shift — from synchronous to asynchronous — unlocks reliability, scalability, and flexibility patterns that are impossible in direct request-response architectures.
Why Queues?
Section titled “Why Queues?”The Problem with Synchronous Coupling
Section titled “The Problem with Synchronous Coupling”User Request → [Order Service] → [Payment Service] → [Inventory Service] → [Email Service] ↑ If this is slow or down, the entire chain failsIn a synchronous chain:
- Availability multiplies downward: If each service is 99.9% available, a 4-service chain is ~99.6% available
- Latency adds up: Each hop adds latency; the user waits for the entire chain
- Cascading failures: One slow service blocks all upstream services
The Queue Solution
Section titled “The Queue Solution”User Request → [Order Service] → [Queue] → [Payment Consumer] → [Inventory Consumer] → [Email Consumer]- Order Service succeeds immediately after enqueuing — it doesn’t wait for downstream services
- Payment, Inventory, and Email process the message independently and at their own pace
- If Email is down, messages queue up; it processes them when it recovers — no messages lost
Core Concepts
Section titled “Core Concepts”Point-to-Point (Queue)
Section titled “Point-to-Point (Queue)”One producer, one consumer. Each message is delivered to exactly one consumer instance.
[Producer] → [Queue] → [Consumer A] (Consumer B never sees this message)- Use for: task distribution, job queues, load-leveling
- Each message represents a unit of work to be done exactly once
Pub/Sub (Topic)
Section titled “Pub/Sub (Topic)”One producer, many consumers. Each subscriber gets its own copy of every message.
→ [Consumer: Analytics][Producer] → [Topic]→ [Consumer: Notification Service] → [Consumer: Audit Logger]- Use for: event broadcasting, fan-out, event-driven architectures
- Publishers don’t know who’s listening; adding a new subscriber requires no change to the publisher
Kafka vs. RabbitMQ
Section titled “Kafka vs. RabbitMQ”These are the two dominant open-source messaging systems, with fundamentally different designs.
Apache Kafka
Section titled “Apache Kafka”Kafka is a distributed commit log. Messages are written to an immutable, ordered log (a topic), partitioned for parallelism. Consumers read from the log at their own offset and can replay messages.
Architecture:
- Topic: A named log stream
- Partition: Ordered, immutable sub-log within a topic. The unit of parallelism.
- Consumer Group: Multiple consumers sharing a topic, each partition consumed by one member at a time
- Offset: The position of a consumer in the log. Consumer-managed; can be replayed.
Strengths:
- Extremely high throughput (millions of messages/sec)
- Message retention: messages stay on disk for a configurable period (hours to weeks) — consumers can replay history
- Naturally ordered within a partition
- Excellent for event sourcing, audit logs, stream processing (Kafka Streams, Flink)
Weaknesses:
- Operationally complex (Zookeeper / KRaft, replication, partition management)
- No native routing / header-based filtering
- Not designed for individual message acknowledgement / dead-lettering at a fine-grained level
Use Kafka for: High-volume event streams, audit logs, data pipelines, analytics, inter-service event buses.
RabbitMQ
Section titled “RabbitMQ”RabbitMQ is a traditional message broker. Messages are routed through exchanges to queues based on routing rules. Once a consumer acknowledges a message, it’s gone.
Architecture:
- Exchange: Receives messages from producers and routes them to queues based on type (direct, topic, fanout, headers)
- Queue: Holds messages waiting for a consumer
- Binding: Rules connecting an exchange to a queue
Strengths:
- Rich routing (header-based, pattern-based, fanout)
- Per-message acknowledgement and dead-letter queues
- Simpler operational model than Kafka for low-to-medium throughput
- Priority queues, delayed messages, TTLs on messages
Weaknesses:
- No built-in message replay (messages are deleted after acknowledgement)
- Throughput ceiling lower than Kafka
Use RabbitMQ for: Task queues, workflow orchestration, routing-heavy patterns, anything requiring per-message acknowledgement and dead-lettering.
Quick Comparison
Section titled “Quick Comparison”| Aspect | Kafka | RabbitMQ |
|---|---|---|
| Model | Distributed log (pull) | Message broker (push) |
| Message retention | Days to weeks (replay possible) | Until acknowledged (no replay) |
| Throughput | Very high (millions/sec) | High (tens of thousands/sec) |
| Routing | Topic + partition key | Exchange types (direct, topic, fanout) |
| Ordering | Per partition | Per queue (single consumer) |
| Ops complexity | High | Low-Medium |
| Best for | Event streams, analytics | Task queues, microservice messaging |
Managed Alternatives
Section titled “Managed Alternatives”| Service | Provider | Based On |
|---|---|---|
| Amazon SQS | AWS | Custom (queue semantics) |
| Amazon SNS | AWS | Pub/Sub |
| Amazon EventBridge | AWS | Event bus with schema registry |
| Azure Service Bus | Azure | Enterprise messaging |
| Google Pub/Sub | GCP | Global pub/sub |
Delivery Guarantees
Section titled “Delivery Guarantees”One of the most critical properties of a messaging system is what it promises about message delivery.
| Guarantee | Meaning | Risk |
|---|---|---|
| At-most-once | Message delivered 0 or 1 times; may be lost | Data loss if consumer crashes before processing |
| At-least-once | Message delivered 1+ times; may be duplicated | Must handle duplicates (idempotency required) |
| Exactly-once | Message delivered exactly once | Hardest to achieve; high overhead |
Idempotency
Section titled “Idempotency”An idempotent consumer can safely receive and process the same message multiple times:
def handle_payment_event(event: dict): payment_id = event["payment_id"]
# Check: has this payment already been processed? if db.exists("SELECT 1 FROM processed_payments WHERE id = ?", payment_id): return # Already processed — silently skip
# Process the payment process_payment(event)
# Mark as processed db.execute("INSERT INTO processed_payments (id) VALUES (?)", payment_id)The deduplication key (payment_id here) is the message’s idempotency key. This should be a stable, unique identifier generated by the producer.
Dead Letter Queues (DLQ)
Section titled “Dead Letter Queues (DLQ)”A DLQ is a queue that receives messages that couldn’t be processed successfully after N retry attempts. Instead of losing the message or blocking normal flow, it’s parked for inspection and reprocessing.
[Normal Queue] → [Consumer] (fails 3 times) ↓ [DLQ] ← [Ops team investigates / replays]Always configure a DLQ in production. Without one, poison messages (messages that always cause processing failures) block queues or cause infinite retry loops.
Event-Driven Architecture (EDA) Patterns
Section titled “Event-Driven Architecture (EDA) Patterns”Choreography
Section titled “Choreography”Services react to events independently. No central orchestrator.
[Order Placed Event] → Payment Service (subscribes, charges card, emits "Payment Processed") → Inventory Service (subscribes, reserves stock, emits "Stock Reserved") → Email Service (subscribes to "Payment Processed", sends confirmation)- Pros: Loose coupling, services evolve independently
- Cons: Hard to trace the overall flow; debugging requires aggregating events across services
Orchestration
Section titled “Orchestration”A central orchestrator service coordinates the workflow.
[Order Orchestrator] → calls Payment Service (waits for result) → calls Inventory Service (waits for result) → calls Email Service (waits for result)- Pros: Clear single source of truth for workflow state; easier to monitor and debug
- Cons: Orchestrator becomes a coupling point; can become a bottleneck
Use choreography for simple, loosely coupled event flows. Use orchestration for complex, long-running workflows with compensation logic (see Saga pattern below).
The Saga Pattern
Section titled “The Saga Pattern”For distributed transactions spanning multiple services (each with their own database), the Saga pattern breaks the transaction into a sequence of local transactions, each publishing an event to trigger the next.
If any step fails, compensating transactions roll back the previous steps:
1. Order Service: Create Order → emit "OrderCreated"2. Payment Service: Charge Card → emit "PaymentProcessed"3. Inventory: Reserve Stock → emit "StockReserved"
// If step 3 fails:3'. Inventory: (fails) → emit "StockReservationFailed"2'. Payment Service: Refund Card → emit "PaymentRefunded" (compensating)1'. Order Service: Cancel Order → (saga complete)Backpressure
Section titled “Backpressure”Backpressure is the mechanism by which a consumer signals to a producer to slow down when it’s overwhelmed.
Without backpressure, a fast producer can overwhelm a slow consumer, causing queue depth to grow unboundedly and eventually crash the consumer or exhaust memory.
Techniques:
- Consumer-controlled polling: Consumer pulls messages at its own pace (Kafka’s model — consumers poll, they’re not pushed to)
- Prefetch limit: RabbitMQ
basic.qoslimits how many unacknowledged messages a consumer receives - Queue depth alerting: Alert when queue depth exceeds threshold; scale consumers or investigate slow processing