Skip to content

Caching Strategies

A cache is a fast, temporary data store that sits in front of a slower data source and serves repeated reads without hitting the origin.

Caching is one of the highest-leverage performance optimizations available — a well-placed cache can reduce database queries by 90%+ and cut p99 latency from hundreds of milliseconds to single digits. It is also one of the most dangerous: a poorly implemented cache serves stale data, causes cache stampedes, and creates consistency bugs that are hard to reproduce.

“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton


Caches exist at multiple layers. Understanding which layer you’re working with determines what strategies apply.

LayerTechnologyWhat It CachesTypical Latency
BrowserHTTP cache headersStatic assets, API responses0ms (local disk)
CDNCloudFront, Cloudflare, FastlyStatic assets, cacheable HTML/API~5–50ms (edge node)
Application / In-processIn-memory dict, Guava cacheFrequently accessed objects (per-instance)<1ms
Distributed CacheRedis, MemcachedShared data across all app instances1–5ms
Database Query CachePostgres, MySQL (deprecated)Query result sets~1ms (avoided in modern DBs)

The most common pattern. The application manages the cache explicitly: check the cache first, fetch from the DB on a miss, then populate the cache.

def get_user(user_id: int) -> User:
# 1. Check cache
cached = redis.get(f"user:{user_id}")
if cached:
return deserialize(cached)
# 2. Cache miss — fetch from DB
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
# 3. Populate cache with a TTL
redis.setex(f"user:{user_id}", ttl=300, value=serialize(user))
return user

Advantages:

  • Only data that’s actually requested gets cached (no wasted memory on cold data)
  • Cache fails gracefully — if Redis is down, the app still works (just slower)

Disadvantages:

  • First request after a cache miss (or cold start) hits the DB — potential latency spike
  • Cache stampede: if a popular key expires, hundreds of requests simultaneously miss and all hit the DB at once

Stampede mitigation:

  • Mutex/lock: Only one request fetches from DB; others wait for the cache to populate
  • Probabilistic Early Expiration: Randomly refresh the cache slightly before it expires, based on estimated recomputation cost
  • Stale-while-revalidate: Return the stale value immediately and refresh in the background

How you write data determines consistency guarantees and performance tradeoffs.

Write to the cache and the database simultaneously, in the same operation.

Write ──► [Cache] ──► [Database] (both or neither)
  • Consistency: Cache is always up-to-date
  • Latency: Write latency = DB write latency (cache doesn’t speed up writes)
  • Risk: Cache fills with data that may never be read (write heavy, low read ratio = wasted cache)

Write to the cache immediately; asynchronously flush to the database later.

Write ──► [Cache]
└──► [async flush] ──► [Database]
  • Consistency: Risk of data loss if cache node fails before flushing
  • Latency: Very fast writes (return as soon as cache is updated)
  • Use for: High-write, low-criticality data (analytics events, view counts, activity feeds)

Skip the cache on writes — write directly to the database. Cache is populated only on subsequent reads (cache-aside pattern).

  • Use for: Data written once but read infrequently — you don’t want to pollute the cache with data that may never be re-read

The hardest problem in caching: when and how to evict stale data.

Simplest approach: every cache entry expires after a fixed duration.

redis.setex("product:42", ttl=3600, value=serialize(product)) # 1-hour TTL
  • Simple: No explicit invalidation code needed
  • Trade-off: Data can be stale for up to TTL seconds; too low a TTL defeats the purpose of caching

TTL guidelines by data type:

Data TypeRecommended TTL
User profile5–15 minutes
Product catalog1–24 hours
Configuration / feature flags1–30 minutes
Session dataEqual to session expiry
Rate limit countersEquals the rate limit window

When data changes, explicitly delete or update the cache entry.

def update_user(user_id: int, data: dict):
db.update("users", user_id, data)
redis.delete(f"user:{user_id}") # Force re-fetch on next read
  • More consistent than TTL alone — cache is invalidated the moment data changes
  • More complex — every write path must remember to invalidate the relevant cache keys

When a cache reaches its memory limit, it must evict old entries. Choose the policy based on your access patterns.

PolicyBehaviorBest For
LRU (Least Recently Used)Evict the item that hasn’t been accessed the longestGeneral-purpose (Redis default)
LFU (Least Frequently Used)Evict the item accessed least oftenWorkloads with clear hot vs. cold items
FIFOEvict the oldest item regardless of accessSimple, fair eviction
RandomEvict a random itemVery fast, used in some CPU caches
TTL-basedEvict expired items firstAlways-on background eviction

Redis allows configuring eviction policy globally: maxmemory-policy allkeys-lru is a safe default for most use cases.


A CDN (Content Delivery Network) caches content at edge nodes geographically close to users. Requests are served by the nearest edge node rather than your origin server.

CDN behavior is controlled via HTTP headers:

Cache-Control: public, max-age=86400 # Cache for 24 hours, anywhere
Cache-Control: private, max-age=300 # Only browser cache, not CDN
Cache-Control: no-cache # Must revalidate before serving
Cache-Control: no-store # Never cache (auth pages, etc.)
Cache-Control: s-maxage=3600, max-age=60 # CDN: 1hr, browser: 1min

A powerful directive: serve the stale cached response immediately while refreshing in the background.

Cache-Control: max-age=60, stale-while-revalidate=300

The user gets a fast response; the CDN updates the cached content asynchronously. Users see content up to 5 minutes old, but never experience a slow cache-miss.

When you deploy new static assets, you want users to get the new version immediately — but their browser has the old version cached. The solution: include a content hash in the filename.

# Before: browser caches app.js — old version served indefinitely
<script src="/app.js">
# After: filename changes on every deploy — cache busted automatically
<script src="/app.d4f3a82.js">

This way, the cache max-age can be very long (years) for hashed files, since the URL itself changes when content changes.


Redis is the de-facto standard distributed cache. Beyond simple key-value storage, it supports data structures that enable sophisticated patterns.

StructureCommandsUse Case
StringGET, SET, INCRSimple cache entries, counters, rate limiting
HashHGET, HSET, HGETALLObject fields (user profile fields individually)
ListLPUSH, RPOPQueues, activity feeds, recent items
SetSADD, SMEMBERS, SINTERUnique visitors, tag membership, deduplication
Sorted SetZADD, ZRANGELeaderboards, ranked results, sliding window rate limits
def is_rate_limited(user_id: str, limit: int = 100, window_seconds: int = 60) -> bool:
key = f"rate:{user_id}:{int(time.time() / window_seconds)}"
count = redis.incr(key)
if count == 1:
redis.expire(key, window_seconds) # Set TTL on first increment
return count > limit

Caching is not always the answer. Avoid caching when:

  • Data changes frequently: If the cache hit rate is low, you’re adding complexity for no benefit
  • Stale data is unacceptable: Financial balances, inventory counts, anything where users must see exact current state
  • The operation is not idempotent: Caching the result of a write operation or side-effectful function is dangerous
  • The source is already fast: Caching an in-memory lookup or a well-indexed simple DB query adds overhead without meaningful benefit