The Application Layer: Monolith vs. Microservices

Every production system has an application layer — the code that receives incoming requests, executes business logic, and coordinates with databases, caches, and external services. Before writing a single line of that code, you face the foundational architectural question: deploy everything as one unit, or split it across multiple independent services?

This is one of the most consequential decisions in system design. It determines how your team works, how your system fails, and how much operational complexity you take on from day one. This choice also has a direct implication for AI-assisted development: when you prompt an AI agent to build a new feature, the agent almost always produces a monolithic structure — adding code to what already exists and growing a single deployable unit, regardless of your intent. Understanding the trade-offs is how you recognize when to accept that result and when to redirect it.

Monolith#

A monolith is an application in which all code is packaged and deployed as a single unit. There is one codebase, one build process, one deployment, and typically one shared database. When the process starts, every feature — user authentication, order management, payment processing, search, notifications — runs together in the same process. Modules communicate through ordinary function calls: no network hop, no data serialization, just one part of the program calling another directly in memory.

This is the consensus starting point. Martin Fowler's MonolithFirst principle states that you should almost always begin with a monolith and extract services only when you have a concrete reason to. The core argument: in the early stages of a product, you don't yet know where the stable service boundaries are. A service boundary is the clean division point where one independent domain ends and another begins — where you can change one side's implementation without the other side being affected. Getting these wrong early and encoding them as distributed service contracts is expensive to undo. Refactoring a module boundary inside a monolith is a code change; refactoring a service boundary in a distributed system means coordinating API changes, data migrations, and deployments across teams. Sam Newman, author of Building Microservices, agrees: adopting microservices should be "a conscious choice made to achieve a specific outcome," not a default.

DHH’s Majestic Monolith makes the practical case concrete: Basecamp’s core product is built around a Rails monolith that powers the web app and the APIs used by its clients, alongside supporting infrastructure like background jobs and email delivery. Shopify, too, has long relied on a large Rails codebase at the center of its platform while scaling to enormous volume. A monolith isn’t a sign of immaturity; for many teams and problem spaces, it can be the most economically rational architecture.

The Monolith: A Single Deployable Unit

All modules — auth, orders, payments, notifications, search — run inside the same process and share a single database. Deploying any change means redeploying the entire application. This is also what AI agents produce naturally when you ask them to build features incrementally — each prompt adds code to the existing app, growing the monolith organically.

Rendering diagram...

A note on ACID: The trade-off card above mentions "ACID transactions." ACID stands for Atomicity, Consistency, Isolation, and Durability — the four guarantees a relational database like PostgreSQL makes about every transaction. Atomicity means a transaction either fully succeeds or fully rolls back; there is no partial write. Consistency means every transaction brings the database from one valid state to another, respecting all defined constraints and integrity rules — no transaction can leave the data in a state that violates those rules. Isolation means concurrent transactions don't interfere with each other. Durability means a committed write survives a crash. In a monolith with one database, you can wrap operations across multiple modules in a single transaction and get all four guarantees for free. In a distributed system, this becomes significantly harder.

The Warning Signs: When a Monolith Is Telling You to Split#

The monolith is a starting point, not a permanent commitment. The triggers to extract a service are organizational and operational — not simply "the codebase is getting large." Here are the six signals that consistently indicate it is time:

Warning Sign	What It Looks Like	Why It Matters
Deployment bottlenecks	Releasing any feature requires coordinating the whole team and deploying the entire application — small changes become high-ceremony, high-risk events	Every deployment is a coordination tax. Teams slow down. Risk accumulates. Releasing becomes feared rather than routine.
Team size exceeds ~8–10 engineers	Merge conflicts multiply; no single person can understand the full codebase anymore	Conway's Law: your architecture will mirror your org chart. Large teams sharing one codebase create accidental coupling that is expensive to untangle.
Wildly different scaling needs	Your search module consumes 10× the CPU of the rest of the app, but you must scale the entire application to serve it	You pay to scale every module when only one needs more resources — economically wasteful and architecturally wasteful at scale.
Technology diversity requirements	A specific component (ML inference, real-time audio, video transcoding) requires a different runtime or language (Python, Go, C++) that cannot be embedded in the primary stack	Some performance or capability requirements simply cannot be met within a single runtime — extraction becomes technically necessary, not just preferable.
Compliance and data isolation	Regulatory requirements (PCI-DSS for payments, HIPAA for health data, GDPR) mandate strict isolation of certain data at the infrastructure level	A shared process and shared database make compliance boundaries impossible to enforce. Service isolation becomes a legal necessity.
Stable domain seams have emerged	After living with the system, clear boundaries appear — domains that rarely communicate with others	These are the safe extraction candidates. Splitting before boundaries stabilize is the leading cause of failed microservices migrations — you end up with distributed coupling instead of distributed independence.

Conway's Law, named after software engineer Melvin Conway (1967), states: "Organizations design systems that mirror their own communication structures." In practice, this means if your entire engineering team shares one codebase with no ownership boundaries, the code will reflect that — tightly coupled, with changes in one area routinely touching another. Conversely, if you organize teams around distinct domains (an Auth team, an Orders team, a Payments team), each with clear ownership, those domain boundaries tend to produce clean architectural seams naturally. Team structure is a legitimate architectural signal, not just an HR concern.

A note on AI-assisted development: When you ask an AI agent to add a feature, it will usually implement it inside the existing application rather than propose a new service boundary. That’s often the right default in the early stages. As your system grows, though, you need to recognize the warning signs yourself: deployment coupling, team coordination bottlenecks, scaling hot spots, and compliance or data-boundary constraints. Even if the AI can read some architecture docs or tickets via model context protocol (MCP), it still lacks full context on your operational history, incident patterns, roadmap tradeoffs, and organizational incentives—and it only reasons over the subset of information you expose in its working set.

Microservices#

A microservices architecture decomposes the application into independently deployable services, each responsible for a specific domain and owning its own database. Services communicate over the network — via HTTP/REST, gRPC, or message queues — rather than in-process function calls.

The central design goal is fault isolation: limiting how much of the system is affected when one component fails. This scope of impact is called blast radius — borrowed from the physical world, where the blast radius of an explosion is the area affected by the damage. In a monolith, any unhandled error in any module crashes the entire process, which means the blast radius is 100%: every feature goes down for every user simultaneously. In microservices, a failure is bounded by the service boundary — only the users of that specific service are affected, while every other service keeps running.

Microservices

In a microservices architecture, each service is independently deployable, owns its own database, and communicates over the network. An API Gateway handles routing, authentication, and rate limiting for all inbound client traffic. Services that don't need an immediate response communicate asynchronously through a message queue, which decouples their availability and limits the blast radius when one service fails.

Rendering diagram...

Containing failures in practice requires more than just splitting services — it requires active protection at every call site between them. The key mechanism is the circuit breaker, a pattern borrowed from electrical engineering. Just as a physical circuit breaker trips when current exceeds a safe threshold — cutting the circuit before the wiring overheats — a software circuit breaker monitors outgoing calls to a downstream service. When the failure rate crosses a threshold (say, 50% of calls failing within a 10-second window), the breaker trips and immediately rejects all further calls to that service for a short cooling-off period, returning a cached fallback or a clear error response instead. Without it, every caller blocks waiting for a response that never comes, consuming threads and memory until the entire system grinds to a halt — a cascading failure. The circuit breaker is what ensures a failing Payments service stays a Payments problem rather than becoming a system-wide outage.

What a Microservices Architecture Looks Like in Practice#

The diagram in the card above shows a typical microservices deployment. Three structural decisions distinguish it from a monolith and are worth understanding explicitly:

API Gateway — A single entry point for all client requests. The gateway is responsible for routing each request to the correct service, enforcing authentication, and applying rate limits. Without it, every client would need to know the network address of every service, authenticate against each one separately, and handle routing logic on its own. The gateway centralizes all of this into one place, so each individual service can focus entirely on its own domain.
Per-service databases — Each service owns its data exclusively. The Orders service cannot directly query the Payments database — it must call the Payments service's API. This boundary is not optional: a shared database is the most common way microservices migrations fail. Consider what happens when teams share a database — if the Payments team renames a column from amount to amount_cents, every other service reading that column breaks simultaneously. One service running a slow, expensive query can hold database locks that block an unrelated service. The shared database becomes the hidden coupling point that defeats the whole purpose of the split. Each service must own its data, and other services must go through that service's API to access it.
Asynchronous communication — The Orders service does not call Payments directly and wait for a response. Instead, it publishes an order event to a message queue. The Payments service reads from that queue and processes the event when it is ready. If Payments is temporarily down, the Orders service keeps running normally — orders pile up in the queue, and Payments catches up when it recovers. There is no cascading failure, just a temporary backlog. This decoupling is the primary mechanism that makes fault isolation work in practice: when services communicate asynchronously, the health of one service no longer determines the health of another.

The Complexity Tax#

Microservices solve specific organizational and scaling problems — but they introduce an entirely new category of problems that simply do not exist in a monolith.

New Problem	What It Means	What You Need
Network failures	Every inter-service call can fail, time out, or return corrupted data. A function call never fails this way — a network call always can.	Retry logic, timeouts, circuit breakers, and defined fallback behavior on every service boundary
Distributed tracing	A single user request might touch 5 services. A slowness or bug anywhere in the chain is invisible without correlation across all 5 log streams.	Tracing infrastructure (Jaeger, Zipkin, Datadog APM) that tags every log entry with a shared request ID and follows it across service boundaries
Service discovery	Services need to find each other's network addresses dynamically as instances scale up and down. Hardcoded addresses break instantly in a dynamic environment.	A service registry (Kubernetes DNS, Consul, AWS Cloud Map) that keeps addresses current as instances come and go
Cross-service data consistency	You cannot wrap two services in a database transaction. If Orders writes successfully but Payments fails, the data is in an inconsistent state.	The Saga pattern or event sourcing to achieve eventual consistency across services without distributed transactions
Operational overhead	N services means N CI/CD pipelines, N monitoring dashboards, N deployment configs, N on-call runbooks — all of which must be maintained.	Platform engineering investment: container orchestration (Kubernetes), centralized logging, a service mesh, and on-call tooling per service

What is the Saga pattern? When an operation spans multiple services — say, a checkout flow that must deduct inventory (Inventory service) and charge the customer (Payments service) — you cannot wrap both writes in a single database transaction. A Saga breaks the operation into a sequence of local transactions, one per service. Each step publishes an event when it succeeds. If any step fails, compensating transactions undo the previous steps. In the checkout example: if Payments fails after inventory was already deducted, a compensating transaction adds the stock back. The result is eventual consistency — the system won't be consistent at every instant, but it will reach a consistent state once all steps complete or compensate. This is more complex to build and reason about than a simple database transaction, but it is the standard approach in microservices architectures.

What is a service mesh? A service mesh is an infrastructure layer — typically a sidecar proxy running alongside each service — that handles all inter-service communication automatically: encrypting traffic, enforcing retry policies, collecting metrics, and applying circuit breakers. Tools like Istio and Linkerd implement this pattern. Without a service mesh, each service must implement these concerns in its own code; with a service mesh, they are handled transparently at the network layer. It is optional for smaller microservices deployments but becomes practically necessary as the number of services grows.

The Amazon Prime Video lesson (2023): Amazon's Prime Video team moved a specific monitoring system back from a microservices/serverless architecture to a monolith and cut infrastructure costs by 90%. The serverless microservices architecture hit throughput limits for high-volume data processing, and the cost of moving data between services through S3 was prohibitively expensive at that scale. Even inside one of the world's most microservices-committed organizations, specific workloads are better served by a consolidated architecture. The architecture should fit the problem. Microservices are a tool, not a destination.

Prompting AI Agents for Each Architecture#

The architecture you choose changes what you need to tell AI agents when you ask them to build features.

Context	Include in Your Prompt	What AI Gets Wrong Without It
Building in a monolith	"Add this to the [orders] module. Follow the existing data access pattern used in [path/to/similar/module]. Use the shared DB connection from [lib/db.ts]. Do not introduce a new database client."	AI adds a second database client, a new configuration file, or creates a second connection pool — accidentally adding distributed complexity inside a monolith
Extracting a service	"Extract the Payments module into a standalone HTTP service. It must own its own database tables. The Orders service will call it via REST at [endpoint]. Define the API contract first, then implement both sides."	AI moves the code but forgets to define a stable API contract, leaves the database shared, or omits error handling for network failures that did not exist before extraction
Building in microservices	"This is the Orders Service. It communicates with Payments asynchronously via the message queue. Handle Payments unavailability gracefully. Use the existing retry utility in [lib/retry.ts]. Do not make a direct HTTP call to Payments."	AI writes a direct synchronous call to Payments, misses error handling for network failures, or ignores the established async communication pattern entirely

Summary#

Concept	The Practical Rule
Start with a monolith	The industry consensus (Fowler, Newman, DHH) for new products and small teams. Microservices are a cost you pay only when you have a specific problem that justifies them.
Modular monolith	Even inside a monolith, organize code into clear modules with explicit interfaces. This is what makes future extraction possible — and what makes AI agents more effective right now.
Blast radius	The scope of impact when a component fails. In a monolith: 100% — one crash takes everything down. In microservices: bounded by the service boundary, using circuit breakers, per-service databases, and async communication.
Six warning signs	Deployment bottlenecks, team size >8–10, divergent scaling needs, technology diversity requirements, compliance isolation, stable domain seams. These are the triggers — not 'the codebase is large.'
The complexity tax	Microservices add: network failure modes, distributed tracing, service discovery, cross-service consistency (Sagas), and N× operational overhead. The architecture must justify this cost.
AI agents need architectural context	AI doesn't know your module structure, communication patterns, or data ownership boundaries unless you specify them, e.g., in CLAUDE.md. The more architectural context you provide, the less the agent will accidentally violate it.

Sources:

PreviousThe Mental Models

NextAPI Design