Simplicity vs. Scalability: Don't Build for 100M Users If You Have 100

Every developer who has read about Netflix, Amazon, or Google has felt the pull: their architecture is impressive, they solved real distributed systems problems at scale — so I should build my system the same way. This instinct is almost always wrong, and acting on it is one of the most costly mistakes a team can make.

The reverse mistake is equally real: building so fast with so little structure that the system collapses the moment it attracts real traffic.

This tutorial is about how to navigate the space between these two failure modes. The goal is not to build the simplest possible system, nor the most scalable possible system — it is to build the right system for your current scale, with the ability to evolve as that scale changes.

The Two Failure Modes#

Before choosing an architecture, you need to recognize which failure mode you are most at risk of falling into. Both are common. Both are expensive. They look very different in the moment but arrive at the same destination: a system that cannot serve its users.

Two Ways to Fail: Under-Engineering and Over-Engineering

Under-engineering produces a system that collapses under its own success. Over-engineering produces a system so complex it never achieves that success. The failure modes look opposite but both have the same result: a system that cannot serve its users.

Rendering diagram...

Real Systems: What Actually Happened#

The companies cited most often as architectural role models did not start the way they ended up. Understanding when and why they changed is more useful than copying their end state.

Shopify: A Modular Monolith at Massive Scale#

Shopify has run a Ruby on Rails monolith since 2004 — and still does today. On Black Friday 2024, it processed 173 billion requests, peaking at 284 million requests per minute and pushing 12 TB of traffic per minute through its edge network. All of this on a Rails monolith.

Shopify did attempt a microservices architecture in 2010–2012. The result was years of technical debt and cascading failures. They consciously reversed course. Their current strategy: a modular monolith with strictly enforced internal boundaries (using an open-source tool called Packwerk to detect and prevent illegal dependencies between modules), Kubernetes autoscaling for stateless services, and database sharding by shop_id for the one component that genuinely needed it.

The lesson: Microservices are not required to scale to the largest e-commerce loads on the internet. What is required is internal discipline — enforced module boundaries, not just informal ones.

Stack Overflow: 1.3 Billion Page Views per Month on 9 Servers#

Stack Overflow handles 1.3 billion monthly page views, over 6,000 requests per second, and renders pages in approximately 12 milliseconds — on 9 on-premise web servers and a single SQL Server primary (with a hot standby). Each web server handles around 450 peak requests per second at roughly 12% CPU utilization.

Their philosophy: aggressive caching, a highly optimized monolith, and a deliberate choice to avoid distributed systems overhead for everything that does not require it. The Stack Overflow team has a name for this overhead — "the SOA tax" (SOA stands for Service-Oriented Architecture, the pattern that microservices descend from). Every service boundary you introduce adds network latency, serialization cost, and operational complexity. If you do not need the independence that boundary provides, you are paying a tax for nothing.

Netflix: Started from a Monolith, Migrated Out of Necessity#

Netflix ran a DVD-shipping monolith for its first decade. In 2008, a major database corruption caused a three-day service outage. That crisis triggered a seven-year migration to over 700 microservices. Netflix now represents one of the most sophisticated distributed architectures in the industry — but it did not start there. The transition happened in response to a genuine, proven problem: the monolith's single point of failure was catastrophically expensive at their scale.

The lesson: Netflix's microservices architecture was the solution to a specific problem they had, not a template to follow from day one.

CompanyArchitectureScale AchievedKey Insight
ShopifyModular monolith (Rails)284M req/min peak (Black Friday 2024)Microservices attempted, then abandoned; modular discipline + sharding for one component won
Stack OverflowMonolith + 9 servers1.3B page views/month, 12ms rendersAggressive caching eliminates the need for distributed systems
NetflixMonolith → 700+ microservices300M+ members globallyMigration driven by a 3-day outage crisis, not premature design
AmazonMonolith → decomposed services1.6B packages/yearDecomposed over years of growth, not from day one
FriendsterMonolith (undisciplined)~115M registered (peak)No architectural discipline led to technical collapse under load; users migrated to MySpace

YAGNI: The Most Important Principle You Are Violating#

YAGNIYou Aren't Gonna Need It — is an Extreme Programming principle that applies directly to system design:

Never implement something because you foresee you might need it. Implement it when you actually need it.

In practice, YAGNI violations in system design look like this:

  • Adding a message queue before you have enough throughput to require asynchronous processing
  • Designing a multi-region active-active database before you have users in more than one region
  • Building a plugin architecture for an application with 50 users
  • Adding a CDN layer before profiling shows that static asset delivery is your bottleneck
  • Creating a dedicated microservice for a function called from only one place

The YAGNI trap is psychological: experienced engineers recognize that eventually these features will be needed. The mistake is building for "eventually" before validating "right now." Every component added before it is needed costs you in three ways:

  1. Time spent building instead of validating the product
  2. Operational overhead the team must maintain indefinitely
  3. Complexity that makes the actual problem harder to find and fix

The correction: build a feature when a specific, measured need demands it — not when you can imagine a scenario where it might be useful.

Premature Optimization: The Full Picture#

Donald Knuth's famous quote is almost always cited incompletely. Here is the full version (1974):

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

The second sentence is almost never quoted. Knuth is not saying "never optimize." He is saying:

  • You cannot know which 3% of your code matters for performance without measuring first.
  • Optimizing the 97% that does not matter wastes time and introduces complexity with no benefit.
  • When you find the 3% that actually is critical — through profiling, not intuition — optimize it carefully.

In system design, this means: do not choose a distributed database because it scales better in theory. Choose it when you have measured that your current database is the actual bottleneck.

Rendering diagram...

The workflow that avoids premature optimization: build first, measure in production, find the specific bottleneck, optimize only that component, then repeat. Any architectural decision that skips the "measure" step is guesswork.

Practical Scaling Stages: A Realistic Roadmap#

Most applications pass through recognizable stages as they grow. Each stage requires a different architecture. The mistake is jumping to stage 5 when you are in stage 1.

StageUser ScaleArchitectureWhat to Do
1 – Validate0 – 1,000 usersSingle server, single database, deployed as one unitShip fast. Validate product-market fit. Do not add infrastructure you haven't proven you need.
2 – Separate1,000 – 10,000 usersApp server and database on separate machinesIsolate resources. Add a connection pool (e.g., PgBouncer). Monitor DB query times.
3 – Replicate10,000 – 100,000 usersLoad balancer + 2–3 app servers + DB read replicasMultiple app servers handle traffic. Read replicas offload reporting and analytics queries from the primary database.
4 – Cache100,000 – 1M usersAdd Redis/Memcached for hot data + CDN for static assetsCache the data your queries hit most often. A CDN reduces latency for global users. At this stage, the database is rarely the bottleneck for reads.
5 – Partition1M – 10M usersDatabase sharding or partitioning by key (e.g., user_id)Shard the database when a single instance's write throughput becomes the bottleneck. Introduce message queues for async workloads.
6 – Decompose10M+ usersExtract services where independent scaling is genuinely requiredDecompose only the components that need to scale independently of the rest. Not everything. Not all at once.

A critical distinction: "concurrent users" (users simultaneously active at one moment) is not the same as "monthly active users." A monolith typically struggles around 10,000 concurrent simultaneous users, not 10,000 total monthly users — those are very different numbers. If your app has 50,000 monthly active users but only 200 active at peak, you are nowhere near the limits of a simple monolith.

Team size matters as much as user scale: research consistently shows that microservices provide a net benefit only for teams of 10 or more engineers. Below this threshold, the coordination overhead of distributed systems outweighs the benefits. A three-engineer team maintaining 12 microservices is a team spending most of its time managing infrastructure rather than building product.

The Modular Monolith: The Underrated Middle Path#

The choice is not "monolith vs. microservices." There is a third architecture that most tutorials skip: the modular monolith.

A modular monolith is a single deployable unit with strictly enforced internal boundaries between modules. Each module owns its own domain logic. Modules communicate through well-defined interfaces, not through shared database tables or direct function calls that cross module boundaries. Think of it as organizing your codebase the way you would eventually organize separate services — but without the network calls, deployment pipelines, or distributed tracing complexity.

Modular Monolith: One Deployment, Clean Internal Boundaries

A modular monolith deploys as one unit — no service mesh, no distributed tracing, no inter-service network calls — but enforces boundaries between modules so that each can eventually be extracted as a service if needed. Shopify has used this pattern to handle Black Friday traffic at global scale.

Rendering diagram...

The distributed monolith trap: The most dangerous architecture is one that looks like microservices but is actually a distributed monolith — services deployed separately but sharing a database, calling each other synchronously in long chains, and unable to be deployed independently without coordinating with other teams. You get all the operational complexity of microservices with none of the benefits. This is the most common outcome when teams decompose too early, before their module boundaries have stabilized.

How AI Coding Agents Make This Worse#

AI coding tools directly affect the simplicity vs. scalability tradeoff — and not always in a helpful direction.

A 2025 Carnegie Mellon study analyzed 807 GitHub repositories that adopted AI coding assistants. Key findings:

  • Code complexity increased by over 40% — more than could be explained by codebase growth alone
  • Static analysis warnings increased by approximately 30% and remained elevated months after adoption
  • The productivity spike from AI-generated code "returns to baseline by month three" — the speed boost is temporary; the complexity accumulates permanently

Why do AI agents generate complexity? Because they operate within a single context window with no persistent understanding of your system's architecture, team size, or current scale. When asked to add a feature, they add it — completely and generically. They do not restructure, remove, or simplify. They have no awareness of YAGNI. They generate patterns appropriate for the problem in isolation, not for the system as a whole. Recent agentic coding tools like Claude Code mitigate this with persistent project context (CLAUDE.md files, memory, and multi-file awareness), but the underlying tendency remains: the agent optimizes for correctness and completeness of the immediate request, not for overall system simplicity.

Concretely, an AI agent asked to "add an analytics pipeline" will likely generate one. It will not ask whether you have enough users to need one, whether your existing database can handle the queries, or whether a simple database index would solve the same problem at a tenth of the complexity and cost.

AI Agent Default BehaviorWhat You Should Ask Instead
Generates the most technically complete implementation of the requested feature"Do we actually need this level of completeness for our current scale?"
Adds new services and infrastructure when asked to solve a scaling problem"Can this be solved with an index, a cache, or a query optimization instead?"
Generates abstract, configurable, generalized solutions by default"Is there a simpler, more direct solution that works for the one case we have?"
Follows patterns from large-scale systems in its training data (Netflix, Google, etc.)"Are those patterns appropriate for our user count, team size, and operational capacity?"
Never deletes or simplifies — always adds"What can we remove or simplify while still solving the problem?"

The practical mitigation: Before accepting any AI-generated architectural suggestion, ask it to justify the choice against your current scale:

"We have [X] users and [N] engineers. Is the solution you just generated appropriate for this scale, or are you defaulting to a pattern designed for a larger system? What is the simplest solution that would work at our current scale?"

This forces the agent to surface its scale assumptions explicitly and gives you the information you need to evaluate the suggestion critically.

A Framework for Deciding When to Scale#

The question is not "should I build for scale?" — you should always design with the ability to scale in mind. The question is "what should I build right now, and what should I defer?"

The Just-in-Time Scaling Decision

Make architectural decisions at the last responsible moment — when you have the most information about what your system actually needs. Defer irreversible decisions. Treat each architectural choice as reversible until the evidence says otherwise.

Rendering diagram...

The Rule of Three: When to Abstract#

A related principle applies at the code level and helps prevent over-engineering in a different form. The Rule of Three, popularized by Martin Fowler's Refactoring, states:

The first time you do something, just do it. The second time you do something similar, do it again. The third time you do something similar, refactor.

Why wait until three occurrences? Abstracting from one or two examples risks choosing the wrong abstraction. Wrong abstractions are worse than duplication because they hide intent, create unintended coupling, and resist change. You need three examples to see the real shape of the pattern — and to know whether those three things are actually the same pattern or merely superficially similar.

The same logic applies to infrastructure: the first time you need to send an email, use a simple library call. The second time, you might notice a pattern. The third time — when you are sending emails from three different places with different retry behaviors — you have enough evidence to build a shared email service.

Summary#

PrincipleWhat It Means in Practice
Don't build for 100M users if you have 100The architecture correct for 100M users is wrong for 100. Start with the simplest architecture that works for your actual scale.
YAGNINever implement something because you might need it. Implement it when you have proven you need it.
Measure before you optimizeYou cannot know the 3% of your system that matters without production data. Optimize only what measurement confirms is the bottleneck.
The modular monolith is not a compromise — it is a legitimate architectureShopify handles Black Friday at global scale on a Rails monolith. Internal boundaries matter more than service count.
Scaling stages are sequential, not optionalMove from one stage to the next when specific, measured pain demands it — not when you imagine it might be needed.
Microservices require 10+ engineers to break evenBelow this threshold, monoliths consistently outperform distributed systems in delivery speed and operational stability.
AI agents generate complexity, not simplicityAI tools increase code complexity by 40%+ (CMU, 2025). Review AI-generated architectural suggestions against your actual scale before accepting them.
Defer reversible decisions; get irreversible decisions right earlyData models and module boundaries are expensive to change — invest in them upfront. Infrastructure scaling choices are mostly reversible — defer them until you have evidence.

The developers who build the most durable systems are not the ones who know the most distributed systems patterns. They are the ones who can resist applying a pattern until the problem it solves has actually appeared. Knowing when not to build is the skill that separates systems that survive their success from systems that collapse under it.

Sources: