Scaling Up vs. Scaling Out

Every system starts on a single server. Then users arrive, traffic grows, and the server begins to strain. At this point, you face the first fundamental scalability question: do you make the machine bigger, or do you add more machines?

This is the distinction between vertical scaling (scaling up) and horizontal scaling (scaling out). Understanding when to use each — and in what order — is one of the most important mental models in production system design.

What Is Scalability?#

Scalability is a system's ability to handle increasing load by adding resources. "Load" can mean many things: more users, more requests per second, more data, or more computation. A scalable system absorbs this growth without requiring a complete architectural rewrite.

Before choosing how to scale, you need to first answer a more fundamental question: what is the actual bottleneck?

  • Is the application server maxing out its CPU or memory?
  • Is the database struggling with too many concurrent queries?
  • Is a single slow service blocking everything else?

Adding resources to the wrong layer is wasted effort. Doubling your application servers won't help if your database is the constraint — the database is still saturated, and every new server just adds more load to it. The first step is always profiling and measurement, not guessing.

Vertical Scaling: Make the Machine Bigger#

Vertical scaling means upgrading the single machine your system runs on — more CPU cores, more RAM, faster storage. You don't change your application code or architecture; you give the same code more resources.

Vertical Scaling (Scale Up)

Upgrade the existing server with more powerful hardware. The application architecture stays the same — only the machine size changes. This is almost always the right first move.

Rendering diagram...

The Key Weakness: Single Point of Failure#

The most serious problem with vertical scaling is that it does not improve reliability at all. A bigger machine is still one machine. If it crashes, gets hit by a hardware failure, or simply needs to be restarted for an update, your entire system is offline.

For internal tools with low availability requirements, this is often acceptable. For production systems with paying users or SLAs, a single point of failure is a risk you must eventually address — and that is where horizontal scaling comes in.

Horizontal Scaling: Add More Machines#

Horizontal scaling means running multiple copies of your server and distributing traffic across them. Instead of one powerful machine, you run several smaller machines behind a load balancer — a component that receives all incoming requests and routes them across the pool of servers.

Horizontal Scaling (Scale Out)

Multiple server instances sit behind a load balancer. Incoming traffic is distributed across all instances. If one server fails, the load balancer routes traffic to the remaining healthy ones — no downtime.

Rendering diagram...

The Stateless Requirement#

Horizontal scaling works cleanly only if your application servers are stateless — meaning any server can handle any request, with no memory of previous requests stored locally on that server.

A load balancer typically routes each incoming request independently, often in round-robin order — Server 1 gets the first request, Server 2 gets the second, and so on. This means two back-to-back requests from the same user may land on completely different servers.

Think about what breaks if your app is not stateless: a user logs in, and Server 1 stores their session in memory. The next request is routed by the load balancer to Server 2. Server 2 has no session data — the user appears logged out. No crash, no error message, just broken, inconsistent behavior.

The fix is to move all shared state out of the application server and into an external service that every instance can reach:

State TypeWrong: Stored on the ServerRight: Stored Externally
User sessionsIn-process memory (MemoryStore)Redis or database
File uploadsLocal disk (/tmp/uploads)Object storage (S3, GCS)
Application cacheIn-process Map or LRU cacheShared cache (Redis, Memcached)
Background jobsIn-memory queue arrayMessage queue (Redis, SQS, RabbitMQ)

Once your application is stateless, adding a new instance is trivial — the load balancer starts routing to it immediately with no coordination required.

Databases: The Hard Part#

Notice that the horizontal scaling diagram above still shows a single shared database. This is intentional: databases are far harder to scale horizontally than application servers.

Application servers are stateless by design (once you fix them). Databases are inherently stateful — they store data that must be consistent (all copies agree on the same values), durable (written data survives crashes), and queryable. You cannot simply spin up five database copies and put a load balancer in front of them without solving data consistency — a write to one copy must be correctly reflected in all others, which requires coordination.

This is why databases typically stay vertically scaled much longer than the application layer. The practical progression looks like this:

Rendering diagram...

Most teams never reach Phase 4. A well-tuned PostgreSQL instance on a large vertical machine, backed by read replicas, can serve enormous workloads. Database sharding — splitting data across multiple servers by a partition key — introduces significant complexity and should be treated as a last resort, not a first move. We cover read replicas and sharding in depth in the next sections.

The key insight: before you shard, add read replicas. Routing read queries to replicas while the primary handles only writes can multiply your total read capacity several times over with minimal architectural change.

Making the Decision#

Here is a practical decision flowchart for a system that is starting to struggle under load:

Rendering diagram...

The most important step is at the top: measure before you scale. The most common engineering mistake is adding capacity to the wrong layer — adding more application servers when the database is saturated, or buying a bigger database when the issue is unindexed queries.

Why the Answer Is Almost Always "Both, In Order"#

In practice, most production systems go through this lifecycle:

StageArchitecturePrimary Scaling StrategySignal to Move On
EarlySingle server running app and database togetherScale up as neededSingle server is a reliability risk, or approaching hardware limits
GrowthApp server and database on separate machinesScale each independently; add DB read replicasApp layer needs more capacity than one machine can provide
ScaleMultiple app servers behind a load balancer; primary DB with read replicasScale out app layer; scale up DB; relentlessly optimize queriesWrite volume or total data size exceeds what a single primary DB can serve
MatureHorizontally scaled app layer; potentially sharded or distributed DBBoth: tune instance sizes vertically, scale out horizontally as neededOngoing — scaling becomes a continuous operational discipline

The pattern: start simple, scale vertically to buy time, then introduce horizontal scaling when reliability or load actually demands it — not before. Teams that build fully distributed, horizontally scaled architectures from day one almost always over-engineer. They pay the complexity costs before those costs are justified by actual load.

A useful rule of thumb: vertical scaling buys simplicity; horizontal scaling buys resilience. Use each when its benefit is actually needed.

What AI Agents Get Wrong About Scaling#

When you ask an AI agent to build a backend service, it will almost always generate code that is implicitly stateful in at least one of these ways:

  • In-memory caching: a Map or dictionary stored at the module level — invisible to other instances.
  • Local file writes: temporary uploads or logs written to the local filesystem.
  • Server-side sessions with MemoryStore: the default session store in frameworks like Express, explicitly not designed for production.

This code works correctly in development — where there is only one instance — and produces silent, intermittent bugs in production the moment you scale to two instances.

Stateful vs. Stateless Server Design

The most common AI-generated scaling anti-pattern: session state stored in process memory. Works on one server, breaks silently on two. The fix is simple — move state to an external store.

Rendering diagram...

One specific pattern to check for: Express.js's default session middleware uses MemoryStore, which stores session data in a plain JavaScript object on the server process. The library itself prints a console warning: "Warning: connect.session() MemoryStore is not designed for a production environment." It is, however, exactly what AI agents generate by default — because it requires no configuration and works fine with a single server. Always swap it for a Redis-backed session store (connect-redis) before deploying behind a load balancer.

Summary#

ConceptKey Point
Vertical scalingMake the server bigger. Fast, zero code changes, but has a hardware ceiling and leaves you with a single point of failure
Horizontal scalingAdd more servers. Eliminates single points of failure and scales without a hardware limit, but requires stateless application design and a load balancer
Stateless designThe prerequisite for horizontal scaling: all persistent state must live in external services (Redis, S3, database), not on the application server
Databases scale differentlyDatabase servers stay vertically scaled longer; the first step is adding read replicas, not sharding — sharding is a last resort
Measure firstAlways identify the actual bottleneck before adding capacity — adding servers to the wrong layer solves nothing
The right orderStart on a single server, scale up to buy time, introduce horizontal scaling when reliability or load actually demands it
AI agent blind spotAI-generated code is often implicitly stateful. Explicitly ask for stateless design and review for in-memory state, local file I/O, and default session stores before deploying

The most common mistake — with or without AI assistance — is reaching for distributed systems complexity before it is warranted. Vertical scaling is not a compromise or a stopgap: a well-tuned, vertically scaled database will often outperform a poorly designed distributed one. The engineering skill is in knowing which stage your system is actually in — and resisting the urge to architect for the stage three steps ahead.

Sources: