Skip to content

System Design Case Studies

System design is where all the individual concepts — databases, caching, queues, load balancing, APIs — get assembled into a coherent architecture that solves a real problem at scale.

These case studies walk through the design process for four classic problems. Each follows the same structure: clarify requirements, estimate scale, design the system, then identify the hard problems.


Problem: Design a service like bit.ly. Given a long URL, return a short code. Given a short code, redirect to the original URL.

Functional:

  • POST /shorten — accepts a long URL, returns a short URL (e.g., sho.rt/xK3p9)
  • GET /{code} — redirects to the original URL
  • (Optional) Custom aliases, link expiry, click analytics

Non-functional:

  • Read-heavy: 100:1 read-to-write ratio
  • Redirect latency must be < 10ms (p99)
  • High availability: links must never 404 after creation (unless explicitly expired)

Scale estimation:

  • 100M new URLs per day = ~1,200 writes/sec
  • 10B redirects per day = ~120,000 reads/sec
  • URL store: 100M/day × 365 days × 5 years × ~500 bytes = ~90 TB over 5 years

Short code generation:

Option 1 — Hash + truncate:

Terminal window
MD5("https://example.com/very/long/path") → "9e107d9d372bb"
Take first 7 chars "9e107d9" # Collision risk: handle with retry or DB check

Option 2 — Base62 encoding of auto-increment ID:

CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
def encode(n: int) -> str:
result = []
while n:
result.append(CHARS[n % 62])
n //= 62
return "".join(reversed(result))
encode(1000000) # → "4c92"

Base62 with 7 characters = 62^7 ≈ 3.5 trillion unique codes. Won’t run out.

Database:

urls table:
id BIGINT PRIMARY KEY (auto-increment)
short_code VARCHAR(10) UNIQUE INDEX
long_url TEXT
created_at TIMESTAMP
expires_at TIMESTAMP (nullable)
user_id BIGINT (nullable)

Redirect flow:

Client → [CDN / Edge Cache] → [App Server]
Redis cache lookup (short_code → long_url)
↓ (cache miss)
DB lookup → populate cache
HTTP 301 Redirect → long_url

HTTP 301 vs 302:

  • 301 Permanent Redirect: Browser caches the redirect. Future requests never hit your server. Saves load but prevents analytics tracking.
  • 302 Temporary Redirect: Browser doesn’t cache. Every redirect hits your server — enables click counting. Use 302 for analytics-enabled links.
  1. Cache stampede on popular links: A viral link expires from cache; thousands of requests simultaneously hit the DB. Solution: cache-aside with mutex or probabilistic expiry.
  2. Custom alias collisions: User wants sho.rt/my-brand but it’s taken. Return 409 Conflict.
  3. Analytics at scale: Don’t write a DB row per click (120k writes/sec is too much). Emit click events to Kafka; batch-aggregate into analytics tables.

Problem: Design a rate limiter that enforces API quotas (e.g., 100 requests per user per minute). It must be fast, accurate, and work across multiple app server instances.

Functional:

  • Limit requests per user (or IP, API key) per time window
  • Return 429 Too Many Requests with Retry-After header when limit exceeded
  • Configurable limits per user tier (Free: 100/min, Pro: 10,000/min)

Non-functional:

  • Decision must be made in < 5ms (can’t slow down the API significantly)
  • Must work across distributed app server fleet (per-instance counters won’t work)
  • Limits must be consistent — a user can’t bypass by hitting different servers

Sliding Window Counter in Redis:

import time
import redis
r = redis.Redis()
def is_allowed(user_id: str, limit: int, window_sec: int = 60) -> bool:
now = time.time()
window_start = now - window_sec
key = f"rate:{user_id}"
pipe = r.pipeline()
# Remove old entries outside the window
pipe.zremrangebyscore(key, 0, window_start)
# Count requests in current window
pipe.zcard(key)
# Add this request
pipe.zadd(key, {now: now})
# Set TTL so the key auto-expires
pipe.expire(key, window_sec)
results = pipe.execute()
request_count = results[1]
return request_count < limit

This uses a Redis Sorted Set where the score = timestamp. The window slides with every request — accurate, no boundary spikes.

Middleware integration:

@app.middleware("http")
async def rate_limit_middleware(request, call_next):
user_id = request.headers.get("X-User-ID", request.client.host)
limit = get_user_limit(user_id) # Look up tier from DB/cache
if not is_allowed(user_id, limit):
return Response(
content='{"error": "rate_limit_exceeded"}',
status_code=429,
headers={"Retry-After": "60"}
)
return await call_next(request)
  1. Redis single point of failure: Use Redis Cluster or Redis Sentinel. For the absolute simplest production setup, Amazon ElastiCache with Multi-AZ.
  2. Race conditions: The pipeline() in Redis ensures the check-then-increment is atomic enough for most use cases. For exact consistency, use a Lua script (executed atomically on the Redis server).
  3. Distributed token bucket: For burst-friendly rate limiting (allow 10 requests instantly but no more than 100/min), implement Token Bucket in Redis with a background refill script.

Problem: Design a service that delivers notifications (push, email, SMS) to users at scale based on events from other services.

Functional:

  • Send notifications via push (iOS, Android), email, and SMS
  • Other services trigger notifications by publishing events: order.placed, payment.failed, friend.request
  • User preferences: users can opt out of specific notification types per channel
  • Template-based: notification content is rendered from configurable templates

Non-functional:

  • At-least-once delivery (don’t drop notifications; duplicates are acceptable if rare)
  • Email: deliver within 1 minute. Push: deliver within 5 seconds.
  • Scale to 1M notifications/day
[Order Service] →
[Payment Service] → [Event Bus (Kafka)] → [Notification Service]
[Social Service] → ↓
[Preference Check] → Skip if opted out
[Template Renderer]
┌────────────────┼────────────────┐
↓ ↓ ↓
[Push Queue] [Email Queue] [SMS Queue]
↓ ↓ ↓
[Push Worker] [Email Worker] [SMS Worker]
(FCM/APNs) (SendGrid/SES) (Twilio/SNS)

Key design decisions:

  • Per-channel queues: Push, Email, and SMS are in separate queues with separate worker pools. A SendGrid outage doesn’t block push notifications.
  • At-least-once + idempotency key: Each notification gets a UUID. Third-party providers (FCM, SendGrid) accept idempotency keys — safe to retry.
  • Preference service: Cached in Redis. user:{id}:prefs contains their channel opt-outs. Checked before rendering to avoid unnecessary work.

Template rendering:

# Template stored in DB: "Your order {{order_id}} has been placed."
def render_notification(template_id: str, context: dict) -> str:
template = template_cache.get(template_id)
return Template(template).render(**context)
  1. Delivery tracking: Log every notification attempt + outcome (delivered, bounced, failed). Surface this in a user-facing “Notification History” UI.
  2. Thundering herd: A flash sale triggers 1M push notifications simultaneously. Use rate-throttled workers that pace sends at 50k/sec to avoid overwhelming FCM rate limits.
  3. User timezone for scheduled notifications: “Send at 9am in user’s timezone” requires scheduling infrastructure (Celery beat, Temporal, or a cron-like service).

Case Study 4: Designing a Simple Key-Value Store

Section titled “Case Study 4: Designing a Simple Key-Value Store”

Problem: Design a distributed key-value store that supports get(key) and put(key, value) operations, is highly available, and can tolerate node failures.

Functional:

  • PUT key value — store a key-value pair
  • GET key — retrieve a value
  • DELETE key
  • Values up to 1 MB in size

Non-functional:

  • High availability (reads available even during node failure)
  • Eventual consistency is acceptable
  • Horizontal scalability — add nodes to increase capacity

Consistent Hashing:

Rather than hash(key) % num_nodes (which requires remapping all keys when nodes are added/removed), use a consistent hash ring:

"node-A" at position 10
key "user:42" |
hash = 8 ─────────── ○ ──────────── "node-B" at position 25
→ routes to node-A | (key goes to next node clockwise)
|
"node-C" at position 50

When a node is added or removed, only the keys on adjacent nodes need to be remapped — not the entire dataset.

Replication factor = 3:

Each key is stored on 3 consecutive nodes on the ring. If one node is down, the other two still serve the key.

Read/Write Quorum:

Using N=3, W=2, R=2:

  • A write is acknowledged after 2 of 3 nodes confirm (W=2)
  • A read returns after 2 of 3 nodes respond (R=2)
  • As long as R + W > N, reads will always see the latest write (quorum intersection)
# Simplified write path
def put(key: str, value: bytes):
nodes = ring.get_nodes(key, count=3) # 3 replicas
acks = 0
for node in nodes:
if node.put(key, value):
acks += 1
if acks >= W: # W = 2
return "OK" # Quorum reached
raise Exception("Write quorum not reached")

Conflict Resolution:

When two nodes have different values for the same key (due to concurrent writes during a partition), use vector clocks or last-write-wins (LWW) with timestamps.

  • LWW: Simpler, slight risk of data loss if clocks drift
  • Vector clocks: More accurate, more complex to implement (Dynamo’s approach)
  1. Gossip protocol: Nodes discover each other’s state (alive/dead, data ownership) via gossip — periodic peer-to-peer state exchange — rather than a central coordinator.
  2. Hinted handoff: If a node is temporarily down during a write, another node temporarily stores the data with a hint. When the down node recovers, the hint is replayed.
  3. Anti-entropy / Merkle trees: Background process that compares key ranges between replicas using a Merkle tree and syncs any differences, ensuring eventual consistency even without write-triggered replication.