Message Queues & Event-Driven Architecture

A message queue is a buffer that decouples the producer of a message from its consumer. Instead of Service A calling Service B directly and waiting for a response, A drops a message into a queue and continues working. B processes it at its own pace.

This simple shift — from synchronous to asynchronous — unlocks reliability, scalability, and flexibility patterns that are impossible in direct request-response architectures.

Why Queues?

The Problem with Synchronous Coupling

User Request → [Order Service] → [Payment Service] → [Inventory Service] → [Email Service]
                                       ↑
                                 If this is slow or down,
                                 the entire chain fails

In a synchronous chain:

Availability multiplies downward: If each service is 99.9% available, a 4-service chain is ~99.6% available
Latency adds up: Each hop adds latency; the user waits for the entire chain
Cascading failures: One slow service blocks all upstream services

The Queue Solution

User Request → [Order Service] → [Queue] → [Payment Consumer]
                                         → [Inventory Consumer]
                                         → [Email Consumer]

Order Service succeeds immediately after enqueuing — it doesn’t wait for downstream services
Payment, Inventory, and Email process the message independently and at their own pace
If Email is down, messages queue up; it processes them when it recovers — no messages lost

Core Concepts

Point-to-Point (Queue)

One producer, one consumer. Each message is delivered to exactly one consumer instance.

[Producer] → [Queue] → [Consumer A]
                             (Consumer B never sees this message)

Use for: task distribution, job queues, load-leveling
Each message represents a unit of work to be done exactly once

Pub/Sub (Topic)

One producer, many consumers. Each subscriber gets its own copy of every message.

                    → [Consumer: Analytics]
[Producer] → [Topic]→ [Consumer: Notification Service]
                    → [Consumer: Audit Logger]

Use for: event broadcasting, fan-out, event-driven architectures
Publishers don’t know who’s listening; adding a new subscriber requires no change to the publisher

Kafka vs. RabbitMQ

These are the two dominant open-source messaging systems, with fundamentally different designs.

Apache Kafka

Kafka is a distributed commit log. Messages are written to an immutable, ordered log (a topic), partitioned for parallelism. Consumers read from the log at their own offset and can replay messages.

Architecture:

Topic: A named log stream
Partition: Ordered, immutable sub-log within a topic. The unit of parallelism.
Consumer Group: Multiple consumers sharing a topic, each partition consumed by one member at a time
Offset: The position of a consumer in the log. Consumer-managed; can be replayed.

Strengths:

Extremely high throughput (millions of messages/sec)
Message retention: messages stay on disk for a configurable period (hours to weeks) — consumers can replay history
Naturally ordered within a partition
Excellent for event sourcing, audit logs, stream processing (Kafka Streams, Flink)

Weaknesses:

Operationally complex (Zookeeper / KRaft, replication, partition management)
No native routing / header-based filtering
Not designed for individual message acknowledgement / dead-lettering at a fine-grained level

Use Kafka for: High-volume event streams, audit logs, data pipelines, analytics, inter-service event buses.

RabbitMQ

RabbitMQ is a traditional message broker. Messages are routed through exchanges to queues based on routing rules. Once a consumer acknowledges a message, it’s gone.

Architecture:

Exchange: Receives messages from producers and routes them to queues based on type (direct, topic, fanout, headers)
Queue: Holds messages waiting for a consumer
Binding: Rules connecting an exchange to a queue

Strengths:

Rich routing (header-based, pattern-based, fanout)
Per-message acknowledgement and dead-letter queues
Simpler operational model than Kafka for low-to-medium throughput
Priority queues, delayed messages, TTLs on messages

Weaknesses:

No built-in message replay (messages are deleted after acknowledgement)
Throughput ceiling lower than Kafka

Use RabbitMQ for: Task queues, workflow orchestration, routing-heavy patterns, anything requiring per-message acknowledgement and dead-lettering.

Quick Comparison

Aspect	Kafka	RabbitMQ
Model	Distributed log (pull)	Message broker (push)
Message retention	Days to weeks (replay possible)	Until acknowledged (no replay)
Throughput	Very high (millions/sec)	High (tens of thousands/sec)
Routing	Topic + partition key	Exchange types (direct, topic, fanout)
Ordering	Per partition	Per queue (single consumer)
Ops complexity	High	Low-Medium
Best for	Event streams, analytics	Task queues, microservice messaging

Managed Alternatives

Service	Provider	Based On
Amazon SQS	AWS	Custom (queue semantics)
Amazon SNS	AWS	Pub/Sub
Amazon EventBridge	AWS	Event bus with schema registry
Azure Service Bus	Azure	Enterprise messaging
Google Pub/Sub	GCP	Global pub/sub

Delivery Guarantees

One of the most critical properties of a messaging system is what it promises about message delivery.

Guarantee	Meaning	Risk
At-most-once	Message delivered 0 or 1 times; may be lost	Data loss if consumer crashes before processing
At-least-once	Message delivered 1+ times; may be duplicated	Must handle duplicates (idempotency required)
Exactly-once	Message delivered exactly once	Hardest to achieve; high overhead

Idempotency

An idempotent consumer can safely receive and process the same message multiple times:

def handle_payment_event(event: dict):
    payment_id = event["payment_id"]

    # Check: has this payment already been processed?
    if db.exists("SELECT 1 FROM processed_payments WHERE id = ?", payment_id):
        return  # Already processed — silently skip

    # Process the payment
    process_payment(event)

    # Mark as processed
    db.execute("INSERT INTO processed_payments (id) VALUES (?)", payment_id)

The deduplication key (payment_id here) is the message’s idempotency key. This should be a stable, unique identifier generated by the producer.

Dead Letter Queues (DLQ)

A DLQ is a queue that receives messages that couldn’t be processed successfully after N retry attempts. Instead of losing the message or blocking normal flow, it’s parked for inspection and reprocessing.

[Normal Queue] → [Consumer] (fails 3 times)
                        ↓
                   [DLQ] ← [Ops team investigates / replays]

Always configure a DLQ in production. Without one, poison messages (messages that always cause processing failures) block queues or cause infinite retry loops.

Event-Driven Architecture (EDA) Patterns

Choreography

Services react to events independently. No central orchestrator.

[Order Placed Event]
        → Payment Service (subscribes, charges card, emits "Payment Processed")
        → Inventory Service (subscribes, reserves stock, emits "Stock Reserved")
        → Email Service (subscribes to "Payment Processed", sends confirmation)

Pros: Loose coupling, services evolve independently
Cons: Hard to trace the overall flow; debugging requires aggregating events across services

Orchestration

A central orchestrator service coordinates the workflow.

[Order Orchestrator]
    → calls Payment Service (waits for result)
    → calls Inventory Service (waits for result)
    → calls Email Service (waits for result)

Pros: Clear single source of truth for workflow state; easier to monitor and debug
Cons: Orchestrator becomes a coupling point; can become a bottleneck

Use choreography for simple, loosely coupled event flows. Use orchestration for complex, long-running workflows with compensation logic (see Saga pattern below).

The Saga Pattern

For distributed transactions spanning multiple services (each with their own database), the Saga pattern breaks the transaction into a sequence of local transactions, each publishing an event to trigger the next.

If any step fails, compensating transactions roll back the previous steps:

1. Order Service:    Create Order      → emit "OrderCreated"
2. Payment Service:  Charge Card       → emit "PaymentProcessed"
3. Inventory:        Reserve Stock     → emit "StockReserved"

// If step 3 fails:
3'. Inventory:       (fails)           → emit "StockReservationFailed"
2'. Payment Service: Refund Card       → emit "PaymentRefunded" (compensating)
1'. Order Service:   Cancel Order      → (saga complete)

Backpressure

Backpressure is the mechanism by which a consumer signals to a producer to slow down when it’s overwhelmed.

Without backpressure, a fast producer can overwhelm a slow consumer, causing queue depth to grow unboundedly and eventually crash the consumer or exhaust memory.

Techniques:

Consumer-controlled polling: Consumer pulls messages at its own pace (Kafka’s model — consumers poll, they’re not pushed to)
Prefetch limit: RabbitMQ basic.qos limits how many unacknowledged messages a consumer receives
Queue depth alerting: Alert when queue depth exceeds threshold; scale consumers or investigate slow processing