Caching & CDNs: Making Your System Fast at Scale

Your database can answer any query. It just can't answer a million of them per second — at least not without becoming a bottleneck that slows down every user in your system. Caching is the solution: storing the results of expensive work somewhere fast and cheap so you don't have to redo it for every request.

This idea applies at multiple layers of a system. A cache is typically an in-memory store (like Redis or Memcached) that sits between your application and your database. A CDN (Content Delivery Network) is a global network of servers that caches your static files and API responses close to your users around the world. Both solve the same fundamental problem — reduce latency and database load — but at different points in the request path.

Caching is also one of the most common places AI-generated systems break down. AI agents readily generate code that works at 100 users but fails at 100,000. They rarely add caching on their own, and when they do, they often do it incorrectly — missing TTLs, skipping invalidation, or creating conditions for a cache stampede. Understanding caching fundamentals is what allows you to review AI-generated systems and know when something is missing or broken.

What Is a Cache?#

A cache is a faster, smaller store that holds copies of data from a slower, larger store. The application checks the cache first. If the data is there (a cache hit), it returns immediately — no database involved. If it's not (a cache miss), the application falls back to the database, then usually stores the result in the cache for next time.

Rendering diagram...

The performance gap is dramatic. A Redis cache lookup takes under 1 millisecond. A PostgreSQL query on a loaded production database might take 50–500ms. At scale, that difference matters enormously: if your homepage fires 10 database queries to render and each query takes 100ms, you're looking at 1 second of database time per page load. Cache those queries and you're back under 10ms.

The cache hit ratio is the key metric: hits / (hits + misses). A healthy cache serves over 90% of requests from cache. A ratio below 80% suggests the cache is providing little benefit — investigate your caching strategy, key design, or TTL settings.

Cache-Aside: The Most Common Pattern#

Cache-aside (also called lazy loading) is the default caching strategy. The name describes the pattern: the cache sits beside the database. The application is responsible for all cache interactions — checking, populating, and invalidating. The database never talks to the cache directly.

Cache-Aside (Lazy Loading)

The application checks the cache first. On a miss, it queries the database, stores the result in cache with a TTL, and returns the data. The cache fills lazily — only with data that was actually requested.

Rendering diagram...

Write-Through: Keeping Cache and Database in Sync#

Write-through caching inverts the write path: instead of writing only to the database and letting the cache grow stale, every write goes through the cache and the database together. Because the cache is always in the write path, it is guaranteed to reflect the latest state after every write.

Write-Through Caching

Every write updates the cache and the database together. Reads are always served from the cache, which is guaranteed to be consistent with the database after every write.

Rendering diagram...

There is a third pattern worth knowing: write-behind (also called write-back). The application writes only to the cache, and the cache flushes to the database asynchronously in the background. This decouples write speed from database latency, making it ideal for high-throughput write scenarios like activity counters and analytics events. The trade-off is durability: if the cache crashes before the flush completes, any unflushed writes are lost. Use write-behind only when occasional data loss is acceptable.

TTL: Giving Cached Data an Expiry Date#

A TTL (Time to Live) is a countdown attached to every cached item. When the TTL expires, the item is automatically deleted. The next request for that key gets a cache miss, fetches fresh data from the database, and re-populates the cache.

Without a TTL, cached items live forever. Your cache fills with stale data from months ago, and you have no way to force a refresh short of manually deleting entries. Always set a TTL on every cached key — this is one of the most common caching mistakes AI agents make.

The core TTL trade-off: a shorter TTL means fresher data but more cache misses and higher database load. A longer TTL means better performance but older data. The right TTL matches the actual rate of change of your data.

Data Type	Recommended TTL	Rationale
User profile (name, avatar)	5–15 minutes	Changes rarely; brief staleness is acceptable
Product listing / price	1–5 minutes	Prices change; you don't want to show wrong prices for long
Dynamic API responses	30–60 seconds	Changes often; short TTL keeps data reasonably fresh
Session tokens	Session length (minutes to hours)	Should expire when the session ends
Static web pages	5 minutes	Balance freshness vs. reducing origin server load
CSS / JS bundles (hashed filenames)	1 week (604,800 seconds)	Content-addressed: filename changes when content changes, so long TTL is safe
Images and media	1 week or more	Rarely change; maximize CDN cache efficiency
Database configuration / feature flags	1 minute	You want flag changes to propagate quickly

TTL jitter is an important safety technique: instead of setting all TTLs to exactly 300 seconds, add a small random offset (TTL = 300 + random(0, 30)). This prevents many cache entries from expiring simultaneously, which would send a synchronized wave of traffic to your database. TTL jitter is especially important when you populate the cache during startup or batch operations, where many keys are written at the same moment and would otherwise expire together.

Cache Invalidation: The Hard Part#

Phil Karlton's famous quote has held up for decades: "There are only two hard things in Computer Science: cache invalidation and naming things."

Cache invalidation means proactively removing or updating a cached value when the underlying data changes — rather than waiting for the TTL to expire. It sounds simple. In distributed systems, it's not.

Rendering diagram...

The challenge is timing: your application runs on multiple servers. When one server updates the database, every other server's cache must be invalidated simultaneously. In practice, network delays mean different users may see different cached values at the same moment — some stale, some fresh.

Invalidation strategies, from simplest to most complex:

TTL-based expiry (simplest): Accept a staleness window. Set a TTL short enough that your users won't notice. This covers 80% of use cases and requires no extra code.
Explicit key deletion on write: When your code updates a record, immediately delete the corresponding cache key. The next read will fetch fresh data. This is more consistent than pure TTL but adds coupling between your write logic and cache logic.
Write-through update: Instead of deleting the key, update the cache value as part of the same write operation. This keeps the cache populated without a cold-miss after the write, but requires careful coordination to ensure both updates succeed atomically.
Stale-while-revalidate: Serve the stale cached value instantly, and refresh it in the background asynchronously. The user sees fast response times; the cache stays reasonably fresh. Widely used at the HTTP layer via Cache-Control: stale-while-revalidate=60.

AI agents almost never implement cache invalidation. They generate code that writes to the database and returns success — the cache is not touched. This means every write creates a staleness window equal to your full TTL. For a product price update, a 10-minute TTL means some users could see the old price for up to 10 minutes. Make cache invalidation an explicit part of your specification when prompting for data-write code.

What Goes Wrong: Three Cache Failure Modes#

Adding a cache without understanding its failure modes creates new classes of bugs. These are the three most important.

Cache Stampede (Thundering Herd)

A popular cache key expires. All concurrent requests get a cache miss at the same moment and flood the database with identical queries. Under heavy load, this can cause the database to buckle under the sudden spike — turning a brief cache expiry into a service outage. The more popular the key and the more concurrent users you have, the more severe the effect.

Rendering diagram...

Cache penetration is a different failure: requests for keys that do not exist in the cache or the database. Every such request is a guaranteed miss that hits the database directly. This is often triggered by malicious actors who probe your API with random or non-existent IDs, effectively bypassing the cache entirely. The fix is to cache negative results — store a sentinel value (e.g., null) for missing keys with a short TTL, so repeated lookups for the same non-existent key are served from cache. A Bloom filter can also sit in front of the database as a first line of defense: it is a space-efficient probabilistic data structure that can definitively confirm a key does not exist (with no false negatives), allowing you to reject impossible lookups before they reach the database. Note that Bloom filters can produce false positives — occasionally reporting a key as potentially present when it isn't — but they never miss a key that actually exists.

Cache poisoning occurs when malicious or incorrect data is injected into the cache — for example, through a manipulated request that tricks the cache into storing a response intended for one user and serving it to others. This is particularly dangerous at the CDN layer, where a poisoned entry gets served to many users. Prevention: never use user-supplied input directly as cache keys without sanitization, and set Vary headers correctly so user-specific responses are never shared across users.

CDNs: Caching at the Network Edge#

A CDN applies the same caching principle at a global scale. Instead of users making requests across the internet to your origin server in one data center, their requests are intercepted by the nearest edge node — one of hundreds of servers co-located at internet exchange points around the world. If the edge node has the content cached, it responds immediately without the request ever reaching your origin.

CDN Architecture: Edge Caching

A CDN distributes copies of your content to edge nodes worldwide. Users are automatically routed to the nearest node via DNS. Static content (images, CSS, JS) and even API responses can be cached at the edge, reducing latency from seconds to milliseconds and dramatically reducing load on your origin server.

Rendering diagram...

Controlling CDN Caching with HTTP Headers#

Your server controls CDN caching behavior through the Cache-Control response header. This is a set of directives your server sends with every response, telling browsers and CDN edge nodes what they are allowed to cache and for how long:

Directive	What It Does	Example Use Case
`Cache-Control: public`	Any cache (browser or CDN) may store this response	Public API responses, static files
`Cache-Control: private`	Only the user's browser may cache; CDN must not store it	Account pages, logged-in dashboard
`Cache-Control: no-store`	Nothing may cache this — always fetch fresh from origin	Payment pages, sensitive forms
`Cache-Control: max-age=3600`	Cache for 3600 seconds (1 hour) in browsers and CDNs	Semi-static content like blog posts
`Cache-Control: s-maxage=86400`	CDNs cache for 24 hours; browser uses `max-age` if set	Static assets served via CDN
`Cache-Control: stale-while-revalidate=60`	Serve stale content immediately; refresh asynchronously in background for 60s	High-traffic pages where freshness can lag slightly

Cache-busting is how you handle long TTLs safely. Instead of setting a 1-week TTL on /static/app.js, you deploy it as /static/app.a3f2d1.js — where the hash in the filename changes every time the file's content changes. Since the URL itself changes when the content changes, you can set the TTL to effectively infinity. Any update produces a new URL, and browsers and CDNs automatically fetch the new file. This is what most modern build tools (Vite, webpack, Next.js) do automatically for production builds.

Cache Eviction: LRU vs. LFU#

Caches are finite. When the cache is full and a new item needs to be stored, something must be evicted. The eviction policy determines what gets removed.

Policy	How It Works	Best For	Watch Out For
LRU (Least Recently Used)	Evicts the item that hasn't been accessed for the longest time	Dynamic workloads, user sessions, social media feeds — anything with strong temporal locality	Cyclic batch jobs that scan all data evict recently-used popular items
LFU (Least Frequently Used)	Evicts the item with the fewest total accesses	Stable popularity patterns — e-commerce catalogs where top products get orders of magnitude more traffic than obscure ones	Newly added popular items are evicted before they accumulate enough access counts
Random	Evicts a randomly selected item	When all items have roughly equal access frequency	Unpredictable — rarely optimal in production
TTL-based (volatile-ttl)	Evicts the item with the shortest remaining TTL first	When you want natural TTL-based cleanup to drive eviction	Items with long TTLs are never evicted even if they're cold

In Redis, you set the eviction policy with maxmemory-policy. The most common production choice is allkeys-lru — LRU across all keys. If your access pattern follows the 80/20 rule (most traffic goes to a small fraction of keys), allkeys-lfu typically delivers a higher cache hit ratio by protecting frequently accessed keys from eviction even when they haven't been used recently. If in doubt, start with allkeys-lru and measure your hit ratio.

The practical rule: always configure maxmemory and a non-noeviction policy before deploying Redis to production. Without a memory limit, Redis will consume all available memory until the server runs out and the process crashes. AI agents do not configure this — you must add it to your infrastructure spec.

The Full Picture: How the Layers Work Together#

Caching is not a single decision — it's a set of layers that each reduce load on the layer below. In a well-designed system, a request might be served from any of these layers without ever reaching your database:

Rendering diagram...

Each cache layer has a different scope, TTL range, and invalidation challenge. The browser cache is private to one user — no coordination needed, but you cannot invalidate it remotely once the TTL is set. The CDN cache is shared and global — fast to read, but slow to purge when you need to push a fix. The application cache (Redis) is under your full control — you can invalidate it instantly in code, but it adds operational complexity and a new dependency to manage.

AI Agents and Caching: A Practical Guide#

AI agents generate functional data access code without caching. They do not add Redis unless you tell them to. They do not set TTLs unless you specify one. They do not implement invalidation unless you describe the pattern. When you accept AI-generated data access code without reviewing it for caching, you are accepting a system with no caching layer by default.

What AI Does by Default	What You Need to Specify
Queries the database on every request	"Cache the result in Redis with a TTL of X seconds using cache-aside"
Writes to the database and returns — cache untouched	"After updating the record, delete or update the cache key for this item"
No TTL on cache keys	"Every cached key must have a TTL — never store without expiry"
No eviction policy on Redis	"Set maxmemory and maxmemory-policy allkeys-lru in Redis config"
Static files served from origin on every request	"Configure CDN caching — static assets use s-maxage=604800, dynamic API responses use s-maxage=60 or Cache-Control: private"
No cache-busting strategy	"Use content-hashed filenames for all static assets so CDN TTLs can be set to 1 week safely"

The pattern is consistent with spec-driven development: define your caching requirements before prompting. Specify which endpoints should be cached, what the TTL is, whether the cache should be invalidated on write, and how to handle the cold-start case. An agent given these constraints will produce a correct caching implementation. An agent without them will skip it entirely.

Summary#

Concept	The Rule
Cache-aside	Default pattern: check cache, fall back to DB on miss, populate cache, set TTL. Resilient, memory-efficient, but stale after writes.
Write-through	Every write updates cache and DB together. Always consistent, but slower writes and risk of cache pollution.
TTL	Always set a TTL. Match it to how fast your data changes. Add jitter to prevent synchronized expiry.
Cache invalidation	Hard in distributed systems. Use TTL for most cases; add explicit key deletion on write for critical data. AI agents skip this by default.
Cache stampede	Popular key expires → all requests miss → DB floods. Fix with TTL jitter, distributed locking, or background refresh.
Cache penetration	Requests for non-existent keys bypass cache entirely. Fix with negative caching (store null) or a Bloom filter.
CDN	Cache at the edge, close to users. Use long TTLs for static assets with content-hashed filenames. Never cache user-specific responses at the CDN without Cache-Control: private.
Eviction policy	Set maxmemory and maxmemory-policy before deploying Redis. Start with allkeys-lru; switch to allkeys-lfu for skewed access patterns.
Cache hit ratio	Target >90%. A ratio below 80% means the cache is not helping — investigate TTL, key design, or access patterns.

Caching is one of the highest-leverage optimizations in any production system. A well-configured cache layer reduces database load by 80–95%, cuts response times from hundreds of milliseconds to under one millisecond, and shields your database from traffic spikes. But every cache introduces a consistency trade-off, a new failure mode, and a new operational concern. The engineers who use caching well are the ones who understand these trade-offs — and who specify them explicitly enough that AI agents can implement them correctly.

Sources:

PreviousStorage

NextLoad Balancing & Traffic

Caching & CDNs: Making Your System Fast at Scale

Cache-Aside (Lazy Loading)

Write-Through Caching

Cache Stampede (Thundering Herd)

CDN Architecture: Edge Caching

Arch Advisor