System Design Case Studies
System design is where all the individual concepts — databases, caching, queues, load balancing, APIs — get assembled into a coherent architecture that solves a real problem at scale.
These case studies walk through the design process for four classic problems. Each follows the same structure: clarify requirements, estimate scale, design the system, then identify the hard problems.
Case Study 1: URL Shortener
Section titled “Case Study 1: URL Shortener”Problem: Design a service like bit.ly. Given a long URL, return a short code. Given a short code, redirect to the original URL.
Requirements
Section titled “Requirements”Functional:
POST /shorten— accepts a long URL, returns a short URL (e.g.,sho.rt/xK3p9)GET /{code}— redirects to the original URL- (Optional) Custom aliases, link expiry, click analytics
Non-functional:
- Read-heavy: 100:1 read-to-write ratio
- Redirect latency must be < 10ms (p99)
- High availability: links must never 404 after creation (unless explicitly expired)
Scale estimation:
- 100M new URLs per day = ~1,200 writes/sec
- 10B redirects per day = ~120,000 reads/sec
- URL store: 100M/day × 365 days × 5 years × ~500 bytes = ~90 TB over 5 years
Core Design
Section titled “Core Design”Short code generation:
Option 1 — Hash + truncate:
MD5("https://example.com/very/long/path") → "9e107d9d372bb"Take first 7 chars → "9e107d9" # Collision risk: handle with retry or DB checkOption 2 — Base62 encoding of auto-increment ID:
CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
def encode(n: int) -> str: result = [] while n: result.append(CHARS[n % 62]) n //= 62 return "".join(reversed(result))
encode(1000000) # → "4c92"Base62 with 7 characters = 62^7 ≈ 3.5 trillion unique codes. Won’t run out.
Database:
urls table: id BIGINT PRIMARY KEY (auto-increment) short_code VARCHAR(10) UNIQUE INDEX long_url TEXT created_at TIMESTAMP expires_at TIMESTAMP (nullable) user_id BIGINT (nullable)Redirect flow:
Client → [CDN / Edge Cache] → [App Server] ↓ Redis cache lookup (short_code → long_url) ↓ (cache miss) DB lookup → populate cache ↓ HTTP 301 Redirect → long_urlHTTP 301 vs 302:
301 Permanent Redirect: Browser caches the redirect. Future requests never hit your server. Saves load but prevents analytics tracking.302 Temporary Redirect: Browser doesn’t cache. Every redirect hits your server — enables click counting. Use302for analytics-enabled links.
Hard Problems
Section titled “Hard Problems”- Cache stampede on popular links: A viral link expires from cache; thousands of requests simultaneously hit the DB. Solution: cache-aside with mutex or probabilistic expiry.
- Custom alias collisions: User wants
sho.rt/my-brandbut it’s taken. Return409 Conflict. - Analytics at scale: Don’t write a DB row per click (120k writes/sec is too much). Emit click events to Kafka; batch-aggregate into analytics tables.
Case Study 2: Rate Limiter
Section titled “Case Study 2: Rate Limiter”Problem: Design a rate limiter that enforces API quotas (e.g., 100 requests per user per minute). It must be fast, accurate, and work across multiple app server instances.
Requirements
Section titled “Requirements”Functional:
- Limit requests per user (or IP, API key) per time window
- Return
429 Too Many RequestswithRetry-Afterheader when limit exceeded - Configurable limits per user tier (Free: 100/min, Pro: 10,000/min)
Non-functional:
- Decision must be made in < 5ms (can’t slow down the API significantly)
- Must work across distributed app server fleet (per-instance counters won’t work)
- Limits must be consistent — a user can’t bypass by hitting different servers
Core Design
Section titled “Core Design”Sliding Window Counter in Redis:
import timeimport redis
r = redis.Redis()
def is_allowed(user_id: str, limit: int, window_sec: int = 60) -> bool: now = time.time() window_start = now - window_sec key = f"rate:{user_id}"
pipe = r.pipeline() # Remove old entries outside the window pipe.zremrangebyscore(key, 0, window_start) # Count requests in current window pipe.zcard(key) # Add this request pipe.zadd(key, {now: now}) # Set TTL so the key auto-expires pipe.expire(key, window_sec) results = pipe.execute()
request_count = results[1] return request_count < limitThis uses a Redis Sorted Set where the score = timestamp. The window slides with every request — accurate, no boundary spikes.
Middleware integration:
@app.middleware("http")async def rate_limit_middleware(request, call_next): user_id = request.headers.get("X-User-ID", request.client.host) limit = get_user_limit(user_id) # Look up tier from DB/cache
if not is_allowed(user_id, limit): return Response( content='{"error": "rate_limit_exceeded"}', status_code=429, headers={"Retry-After": "60"} )
return await call_next(request)Hard Problems
Section titled “Hard Problems”- Redis single point of failure: Use Redis Cluster or Redis Sentinel. For the absolute simplest production setup, Amazon ElastiCache with Multi-AZ.
- Race conditions: The
pipeline()in Redis ensures the check-then-increment is atomic enough for most use cases. For exact consistency, use a Lua script (executed atomically on the Redis server). - Distributed token bucket: For burst-friendly rate limiting (allow 10 requests instantly but no more than 100/min), implement Token Bucket in Redis with a background refill script.
Case Study 3: Notification Service
Section titled “Case Study 3: Notification Service”Problem: Design a service that delivers notifications (push, email, SMS) to users at scale based on events from other services.
Requirements
Section titled “Requirements”Functional:
- Send notifications via push (iOS, Android), email, and SMS
- Other services trigger notifications by publishing events:
order.placed,payment.failed,friend.request - User preferences: users can opt out of specific notification types per channel
- Template-based: notification content is rendered from configurable templates
Non-functional:
- At-least-once delivery (don’t drop notifications; duplicates are acceptable if rare)
- Email: deliver within 1 minute. Push: deliver within 5 seconds.
- Scale to 1M notifications/day
Architecture
Section titled “Architecture”[Order Service] →[Payment Service] → [Event Bus (Kafka)] → [Notification Service][Social Service] → ↓ [Preference Check] → Skip if opted out ↓ [Template Renderer] ↓ ┌────────────────┼────────────────┐ ↓ ↓ ↓ [Push Queue] [Email Queue] [SMS Queue] ↓ ↓ ↓ [Push Worker] [Email Worker] [SMS Worker] (FCM/APNs) (SendGrid/SES) (Twilio/SNS)Key design decisions:
- Per-channel queues: Push, Email, and SMS are in separate queues with separate worker pools. A SendGrid outage doesn’t block push notifications.
- At-least-once + idempotency key: Each notification gets a UUID. Third-party providers (FCM, SendGrid) accept idempotency keys — safe to retry.
- Preference service: Cached in Redis.
user:{id}:prefscontains their channel opt-outs. Checked before rendering to avoid unnecessary work.
Template rendering:
# Template stored in DB: "Your order {{order_id}} has been placed."def render_notification(template_id: str, context: dict) -> str: template = template_cache.get(template_id) return Template(template).render(**context)Hard Problems
Section titled “Hard Problems”- Delivery tracking: Log every notification attempt + outcome (delivered, bounced, failed). Surface this in a user-facing “Notification History” UI.
- Thundering herd: A flash sale triggers 1M push notifications simultaneously. Use rate-throttled workers that pace sends at 50k/sec to avoid overwhelming FCM rate limits.
- User timezone for scheduled notifications: “Send at 9am in user’s timezone” requires scheduling infrastructure (Celery beat, Temporal, or a cron-like service).
Case Study 4: Designing a Simple Key-Value Store
Section titled “Case Study 4: Designing a Simple Key-Value Store”Problem: Design a distributed key-value store that supports get(key) and put(key, value) operations, is highly available, and can tolerate node failures.
Requirements
Section titled “Requirements”Functional:
PUT key value— store a key-value pairGET key— retrieve a valueDELETE key- Values up to 1 MB in size
Non-functional:
- High availability (reads available even during node failure)
- Eventual consistency is acceptable
- Horizontal scalability — add nodes to increase capacity
Architecture
Section titled “Architecture”Consistent Hashing:
Rather than hash(key) % num_nodes (which requires remapping all keys when nodes are added/removed), use a consistent hash ring:
"node-A" at position 10 ↑ key "user:42" | hash = 8 ─────────── ○ ──────────── "node-B" at position 25 → routes to node-A | (key goes to next node clockwise) | "node-C" at position 50When a node is added or removed, only the keys on adjacent nodes need to be remapped — not the entire dataset.
Replication factor = 3:
Each key is stored on 3 consecutive nodes on the ring. If one node is down, the other two still serve the key.
Read/Write Quorum:
Using N=3, W=2, R=2:
- A write is acknowledged after 2 of 3 nodes confirm (
W=2) - A read returns after 2 of 3 nodes respond (
R=2) - As long as
R + W > N, reads will always see the latest write (quorum intersection)
# Simplified write pathdef put(key: str, value: bytes): nodes = ring.get_nodes(key, count=3) # 3 replicas acks = 0 for node in nodes: if node.put(key, value): acks += 1 if acks >= W: # W = 2 return "OK" # Quorum reached raise Exception("Write quorum not reached")Conflict Resolution:
When two nodes have different values for the same key (due to concurrent writes during a partition), use vector clocks or last-write-wins (LWW) with timestamps.
- LWW: Simpler, slight risk of data loss if clocks drift
- Vector clocks: More accurate, more complex to implement (Dynamo’s approach)
Hard Problems
Section titled “Hard Problems”- Gossip protocol: Nodes discover each other’s state (alive/dead, data ownership) via gossip — periodic peer-to-peer state exchange — rather than a central coordinator.
- Hinted handoff: If a node is temporarily down during a write, another node temporarily stores the data with a hint. When the down node recovers, the hint is replayed.
- Anti-entropy / Merkle trees: Background process that compares key ranges between replicas using a Merkle tree and syncs any differences, ensuring eventual consistency even without write-triggered replication.