Scaling Up vs. Scaling Out
Every system starts on a single server. Then users arrive, traffic grows, and the server begins to strain. At this point, you face the first fundamental scalability question: do you make the machine bigger, or do you add more machines?
This is the distinction between vertical scaling (scaling up) and horizontal scaling (scaling out). Understanding when to use each — and in what order — is one of the most important mental models in production system design.
What Is Scalability?#
Scalability is a system's ability to handle increasing load by adding resources. "Load" can mean many things: more users, more requests per second, more data, or more computation. A scalable system absorbs this growth without requiring a complete architectural rewrite.
Before choosing how to scale, you need to first answer a more fundamental question: what is the actual bottleneck?
- Is the application server maxing out its CPU or memory?
- Is the database struggling with too many concurrent queries?
- Is a single slow service blocking everything else?
Adding resources to the wrong layer is wasted effort. Doubling your application servers won't help if your database is the constraint — the database is still saturated, and every new server just adds more load to it. The first step is always profiling and measurement, not guessing.
Vertical Scaling: Make the Machine Bigger#
Vertical scaling means upgrading the single machine your system runs on — more CPU cores, more RAM, faster storage. You don't change your application code or architecture; you give the same code more resources.
Vertical Scaling (Scale Up)
Upgrade the existing server with more powerful hardware. The application architecture stays the same — only the machine size changes. This is almost always the right first move.
The Key Weakness: Single Point of Failure#
The most serious problem with vertical scaling is that it does not improve reliability at all. A bigger machine is still one machine. If it crashes, gets hit by a hardware failure, or simply needs to be restarted for an update, your entire system is offline.
For internal tools with low availability requirements, this is often acceptable. For production systems with paying users or SLAs, a single point of failure is a risk you must eventually address — and that is where horizontal scaling comes in.
Horizontal Scaling: Add More Machines#
Horizontal scaling means running multiple copies of your server and distributing traffic across them. Instead of one powerful machine, you run several smaller machines behind a load balancer — a component that receives all incoming requests and routes them across the pool of servers.
Horizontal Scaling (Scale Out)
Multiple server instances sit behind a load balancer. Incoming traffic is distributed across all instances. If one server fails, the load balancer routes traffic to the remaining healthy ones — no downtime.
The Stateless Requirement#
Horizontal scaling works cleanly only if your application servers are stateless — meaning any server can handle any request, with no memory of previous requests stored locally on that server.
A load balancer typically routes each incoming request independently, often in round-robin order — Server 1 gets the first request, Server 2 gets the second, and so on. This means two back-to-back requests from the same user may land on completely different servers.
Think about what breaks if your app is not stateless: a user logs in, and Server 1 stores their session in memory. The next request is routed by the load balancer to Server 2. Server 2 has no session data — the user appears logged out. No crash, no error message, just broken, inconsistent behavior.
The fix is to move all shared state out of the application server and into an external service that every instance can reach:
| State Type | Wrong: Stored on the Server | Right: Stored Externally |
|---|---|---|
| User sessions | In-process memory (MemoryStore) | Redis or database |
| File uploads | Local disk (/tmp/uploads) | Object storage (S3, GCS) |
| Application cache | In-process Map or LRU cache | Shared cache (Redis, Memcached) |
| Background jobs | In-memory queue array | Message queue (Redis, SQS, RabbitMQ) |
Once your application is stateless, adding a new instance is trivial — the load balancer starts routing to it immediately with no coordination required.
Databases: The Hard Part#
Notice that the horizontal scaling diagram above still shows a single shared database. This is intentional: databases are far harder to scale horizontally than application servers.
Application servers are stateless by design (once you fix them). Databases are inherently stateful — they store data that must be consistent (all copies agree on the same values), durable (written data survives crashes), and queryable. You cannot simply spin up five database copies and put a load balancer in front of them without solving data consistency — a write to one copy must be correctly reflected in all others, which requires coordination.
This is why databases typically stay vertically scaled much longer than the application layer. The practical progression looks like this:
Most teams never reach Phase 4. A well-tuned PostgreSQL instance on a large vertical machine, backed by read replicas, can serve enormous workloads. Database sharding — splitting data across multiple servers by a partition key — introduces significant complexity and should be treated as a last resort, not a first move. We cover read replicas and sharding in depth in the next sections.
The key insight: before you shard, add read replicas. Routing read queries to replicas while the primary handles only writes can multiply your total read capacity several times over with minimal architectural change.
Making the Decision#
Here is a practical decision flowchart for a system that is starting to struggle under load:
The most important step is at the top: measure before you scale. The most common engineering mistake is adding capacity to the wrong layer — adding more application servers when the database is saturated, or buying a bigger database when the issue is unindexed queries.
Why the Answer Is Almost Always "Both, In Order"#
In practice, most production systems go through this lifecycle:
| Stage | Architecture | Primary Scaling Strategy | Signal to Move On |
|---|---|---|---|
| Early | Single server running app and database together | Scale up as needed | Single server is a reliability risk, or approaching hardware limits |
| Growth | App server and database on separate machines | Scale each independently; add DB read replicas | App layer needs more capacity than one machine can provide |
| Scale | Multiple app servers behind a load balancer; primary DB with read replicas | Scale out app layer; scale up DB; relentlessly optimize queries | Write volume or total data size exceeds what a single primary DB can serve |
| Mature | Horizontally scaled app layer; potentially sharded or distributed DB | Both: tune instance sizes vertically, scale out horizontally as needed | Ongoing — scaling becomes a continuous operational discipline |
The pattern: start simple, scale vertically to buy time, then introduce horizontal scaling when reliability or load actually demands it — not before. Teams that build fully distributed, horizontally scaled architectures from day one almost always over-engineer. They pay the complexity costs before those costs are justified by actual load.
A useful rule of thumb: vertical scaling buys simplicity; horizontal scaling buys resilience. Use each when its benefit is actually needed.
What AI Agents Get Wrong About Scaling#
When you ask an AI agent to build a backend service, it will almost always generate code that is implicitly stateful in at least one of these ways:
- In-memory caching: a
Mapor dictionary stored at the module level — invisible to other instances. - Local file writes: temporary uploads or logs written to the local filesystem.
- Server-side sessions with
MemoryStore: the default session store in frameworks like Express, explicitly not designed for production.
This code works correctly in development — where there is only one instance — and produces silent, intermittent bugs in production the moment you scale to two instances.
Stateful vs. Stateless Server Design
The most common AI-generated scaling anti-pattern: session state stored in process memory. Works on one server, breaks silently on two. The fix is simple — move state to an external store.
One specific pattern to check for: Express.js's default session middleware uses MemoryStore, which stores session data in a plain JavaScript object on the server process. The library itself prints a console warning: "Warning: connect.session() MemoryStore is not designed for a production environment." It is, however, exactly what AI agents generate by default — because it requires no configuration and works fine with a single server. Always swap it for a Redis-backed session store (connect-redis) before deploying behind a load balancer.
Summary#
| Concept | Key Point |
|---|---|
| Vertical scaling | Make the server bigger. Fast, zero code changes, but has a hardware ceiling and leaves you with a single point of failure |
| Horizontal scaling | Add more servers. Eliminates single points of failure and scales without a hardware limit, but requires stateless application design and a load balancer |
| Stateless design | The prerequisite for horizontal scaling: all persistent state must live in external services (Redis, S3, database), not on the application server |
| Databases scale differently | Database servers stay vertically scaled longer; the first step is adding read replicas, not sharding — sharding is a last resort |
| Measure first | Always identify the actual bottleneck before adding capacity — adding servers to the wrong layer solves nothing |
| The right order | Start on a single server, scale up to buy time, introduce horizontal scaling when reliability or load actually demands it |
| AI agent blind spot | AI-generated code is often implicitly stateful. Explicitly ask for stateless design and review for in-memory state, local file I/O, and default session stores before deploying |
The most common mistake — with or without AI assistance — is reaching for distributed systems complexity before it is warranted. Vertical scaling is not a compromise or a stopgap: a well-tuned, vertically scaled database will often outperform a poorly designed distributed one. The engineering skill is in knowing which stage your system is actually in — and resisting the urge to architect for the stage three steps ahead.
Sources:
- Horizontal and Vertical Scaling | System Design - GeeksforGeeks
- Vertical Scaling vs. Horizontal Scaling - CockroachLabs
- Horizontal scaling vs vertical scaling - DigitalOcean
- Scale Up vs Scale Out - Portworx
- Stateless vs Stateful: How to Scale Your Systems Like a Pro - DesignGurus
- Choosing Between Vertical and Horizontal Scaling Strategies for AWS - CloudThat
- Horizontal Scaling vs Vertical Scaling | System Design Handbook