System Design Case Studies

System design is where all the individual concepts — databases, caching, queues, load balancing, APIs — get assembled into a coherent architecture that solves a real problem at scale.

These case studies walk through the design process for four classic problems. Each follows the same structure: clarify requirements, estimate scale, design the system, then identify the hard problems.

Case Study 1: URL Shortener

Problem: Design a service like bit.ly. Given a long URL, return a short code. Given a short code, redirect to the original URL.

Requirements

Functional:

POST /shorten — accepts a long URL, returns a short URL (e.g., sho.rt/xK3p9)
GET /{code} — redirects to the original URL
(Optional) Custom aliases, link expiry, click analytics

Non-functional:

Read-heavy: 100:1 read-to-write ratio
Redirect latency must be < 10ms (p99)
High availability: links must never 404 after creation (unless explicitly expired)

Scale estimation:

100M new URLs per day = ~1,200 writes/sec
10B redirects per day = ~120,000 reads/sec
URL store: 100M/day × 365 days × 5 years × ~500 bytes = ~90 TB over 5 years

Core Design

Short code generation:

Option 1 — Hash + truncate:

MD5("https://example.com/very/long/path") → "9e107d9d372bb"
Take first 7 chars → "9e107d9" # Collision risk: handle with retry or DB check

Option 2 — Base62 encoding of auto-increment ID:

CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

def encode(n: int) -> str:
    result = []
    while n:
        result.append(CHARS[n % 62])
        n //= 62
    return "".join(reversed(result))

encode(1000000)  # → "4c92"

Base62 with 7 characters = 62^7 ≈ 3.5 trillion unique codes. Won’t run out.

Database:

urls table:
  id          BIGINT PRIMARY KEY (auto-increment)
  short_code  VARCHAR(10) UNIQUE INDEX
  long_url    TEXT
  created_at  TIMESTAMP
  expires_at  TIMESTAMP (nullable)
  user_id     BIGINT (nullable)

Redirect flow:

Client → [CDN / Edge Cache] → [App Server]
                                    ↓
                             Redis cache lookup (short_code → long_url)
                                    ↓ (cache miss)
                              DB lookup → populate cache
                                    ↓
                        HTTP 301 Redirect → long_url

HTTP 301 vs 302:

301 Permanent Redirect: Browser caches the redirect. Future requests never hit your server. Saves load but prevents analytics tracking.
302 Temporary Redirect: Browser doesn’t cache. Every redirect hits your server — enables click counting. Use 302 for analytics-enabled links.

Hard Problems

Cache stampede on popular links: A viral link expires from cache; thousands of requests simultaneously hit the DB. Solution: cache-aside with mutex or probabilistic expiry.
Custom alias collisions: User wants sho.rt/my-brand but it’s taken. Return 409 Conflict.
Analytics at scale: Don’t write a DB row per click (120k writes/sec is too much). Emit click events to Kafka; batch-aggregate into analytics tables.

Case Study 2: Rate Limiter

Problem: Design a rate limiter that enforces API quotas (e.g., 100 requests per user per minute). It must be fast, accurate, and work across multiple app server instances.

Requirements

Functional:

Limit requests per user (or IP, API key) per time window
Return 429 Too Many Requests with Retry-After header when limit exceeded
Configurable limits per user tier (Free: 100/min, Pro: 10,000/min)

Non-functional:

Decision must be made in < 5ms (can’t slow down the API significantly)
Must work across distributed app server fleet (per-instance counters won’t work)
Limits must be consistent — a user can’t bypass by hitting different servers

Core Design

Sliding Window Counter in Redis:

import time
import redis

r = redis.Redis()

def is_allowed(user_id: str, limit: int, window_sec: int = 60) -> bool:
    now = time.time()
    window_start = now - window_sec
    key = f"rate:{user_id}"

    pipe = r.pipeline()
    # Remove old entries outside the window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count requests in current window
    pipe.zcard(key)
    # Add this request
    pipe.zadd(key, {now: now})
    # Set TTL so the key auto-expires
    pipe.expire(key, window_sec)
    results = pipe.execute()

    request_count = results[1]
    return request_count < limit

This uses a Redis Sorted Set where the score = timestamp. The window slides with every request — accurate, no boundary spikes.

Middleware integration:

@app.middleware("http")
async def rate_limit_middleware(request, call_next):
    user_id = request.headers.get("X-User-ID", request.client.host)
    limit = get_user_limit(user_id)  # Look up tier from DB/cache

    if not is_allowed(user_id, limit):
        return Response(
            content='{"error": "rate_limit_exceeded"}',
            status_code=429,
            headers={"Retry-After": "60"}
        )

    return await call_next(request)

Hard Problems

Redis single point of failure: Use Redis Cluster or Redis Sentinel. For the absolute simplest production setup, Amazon ElastiCache with Multi-AZ.
Race conditions: The pipeline() in Redis ensures the check-then-increment is atomic enough for most use cases. For exact consistency, use a Lua script (executed atomically on the Redis server).
Distributed token bucket: For burst-friendly rate limiting (allow 10 requests instantly but no more than 100/min), implement Token Bucket in Redis with a background refill script.

Case Study 3: Notification Service

Problem: Design a service that delivers notifications (push, email, SMS) to users at scale based on events from other services.

Requirements

Functional:

Send notifications via push (iOS, Android), email, and SMS
Other services trigger notifications by publishing events: order.placed, payment.failed, friend.request
User preferences: users can opt out of specific notification types per channel
Template-based: notification content is rendered from configurable templates

Non-functional:

At-least-once delivery (don’t drop notifications; duplicates are acceptable if rare)
Email: deliver within 1 minute. Push: deliver within 5 seconds.
Scale to 1M notifications/day

Architecture

[Order Service]    →
[Payment Service]  → [Event Bus (Kafka)] → [Notification Service]
[Social Service]   →                              ↓
                                    [Preference Check] → Skip if opted out
                                           ↓
                                  [Template Renderer]
                                           ↓
                          ┌────────────────┼────────────────┐
                          ↓                ↓                ↓
                    [Push Queue]    [Email Queue]    [SMS Queue]
                          ↓                ↓                ↓
                   [Push Worker]   [Email Worker]   [SMS Worker]
                   (FCM/APNs)     (SendGrid/SES)   (Twilio/SNS)

Key design decisions:

Per-channel queues: Push, Email, and SMS are in separate queues with separate worker pools. A SendGrid outage doesn’t block push notifications.
At-least-once + idempotency key: Each notification gets a UUID. Third-party providers (FCM, SendGrid) accept idempotency keys — safe to retry.
Preference service: Cached in Redis. user:{id}:prefs contains their channel opt-outs. Checked before rendering to avoid unnecessary work.

Template rendering:

# Template stored in DB: "Your order {{order_id}} has been placed."
def render_notification(template_id: str, context: dict) -> str:
    template = template_cache.get(template_id)
    return Template(template).render(**context)

Hard Problems

Delivery tracking: Log every notification attempt + outcome (delivered, bounced, failed). Surface this in a user-facing “Notification History” UI.
Thundering herd: A flash sale triggers 1M push notifications simultaneously. Use rate-throttled workers that pace sends at 50k/sec to avoid overwhelming FCM rate limits.
User timezone for scheduled notifications: “Send at 9am in user’s timezone” requires scheduling infrastructure (Celery beat, Temporal, or a cron-like service).

Case Study 4: Designing a Simple Key-Value Store

Problem: Design a distributed key-value store that supports get(key) and put(key, value) operations, is highly available, and can tolerate node failures.

Requirements

Functional:

PUT key value — store a key-value pair
GET key — retrieve a value
DELETE key
Values up to 1 MB in size

Non-functional:

High availability (reads available even during node failure)
Eventual consistency is acceptable
Horizontal scalability — add nodes to increase capacity

Architecture

Consistent Hashing:

Rather than hash(key) % num_nodes (which requires remapping all keys when nodes are added/removed), use a consistent hash ring:

              "node-A" at position 10
                        ↑
  key "user:42"         |
  hash = 8 ─────────── ○ ──────────── "node-B" at position 25
  → routes to node-A    |              (key goes to next node clockwise)
                        |
              "node-C" at position 50

When a node is added or removed, only the keys on adjacent nodes need to be remapped — not the entire dataset.

Replication factor = 3:

Each key is stored on 3 consecutive nodes on the ring. If one node is down, the other two still serve the key.

Read/Write Quorum:

Using N=3, W=2, R=2:

A write is acknowledged after 2 of 3 nodes confirm (W=2)
A read returns after 2 of 3 nodes respond (R=2)
As long as R + W > N, reads will always see the latest write (quorum intersection)

# Simplified write path
def put(key: str, value: bytes):
    nodes = ring.get_nodes(key, count=3)  # 3 replicas
    acks = 0
    for node in nodes:
        if node.put(key, value):
            acks += 1
        if acks >= W:  # W = 2
            return "OK"  # Quorum reached
    raise Exception("Write quorum not reached")

Conflict Resolution:

When two nodes have different values for the same key (due to concurrent writes during a partition), use vector clocks or last-write-wins (LWW) with timestamps.

LWW: Simpler, slight risk of data loss if clocks drift
Vector clocks: More accurate, more complex to implement (Dynamo’s approach)

Hard Problems

Gossip protocol: Nodes discover each other’s state (alive/dead, data ownership) via gossip — periodic peer-to-peer state exchange — rather than a central coordinator.
Hinted handoff: If a node is temporarily down during a write, another node temporarily stores the data with a hint. When the down node recovers, the hint is replayed.
Anti-entropy / Merkle trees: Background process that compares key ranges between replicas using a Merkle tree and syncs any differences, ensuring eventual consistency even without write-triggered replication.