What Diagrams Miss: Budgets, Legacy, and Org Reality

Draw a clean architecture diagram. Add boxes for your services, arrows for your data flows, a cache in front of your database. The diagram is technically sound — it would pass any whiteboard interview.

Now hand it to an engineering team at a real company, and watch what happens.

The questions come quickly: "What does this cost to run per month?" "Which of these services replaces the payments service we already have?" "Who owns the auth box — is that the platform team or us?" "We want to add a cache, but the team that owns the database requires a two-week review for any changes."

None of those questions appear on the diagram. That gap is what this tutorial is about.

Architecture diagrams are intent documents, not reality maps. They capture the ideal structure of a system under ideal conditions — no budget constraints, no existing code, no team boundaries, no political friction. Production systems live in the space between that ideal and the real world. Understanding what diagrams miss is what separates engineers who design on paper from engineers who actually ship.

The Eight Failure Points in Every Arrow#

Before looking at organizational and financial realities, start with a physical one: the arrows in your diagram are misleading you. Each arrow represents a network call — and every network call has eight independent points where it can fail.

A Single Network Call Has Eight Ways to Fail

Diagrams draw one arrow between two boxes. In production, a single request must survive eight distinct failure points. Any one of them can fail independently — and unlike a crash that takes everything down at once, partial failures leave the system in an unknown state that is much harder to recover from.

Rendering diagram...

Peter Deutsch codified these physical realities as The Eight Fallacies of Distributed Computing — a list of assumptions engineers commonly make that are reliably false in production. The fallacies are: the network is reliable; latency is zero; bandwidth is infinite; the network is secure; topology does not change; transport is free; there is one administrator; and the infrastructure is homogeneous. Diagrams encode all eight of these fallacies by default. Every clean arrow is an implicit assumption that the network will work perfectly, every time.

Budgets: The Costs That Don't Fit in a Box#

Architecture diagrams have no line for cost. Adding a box for a managed Kubernetes cluster, a Redis cache, a vector database, and a CDN is free on a whiteboard. In production, it is not.

The three budget realities that diagrams miss:

1. The Infrastructure Bill#

Cloud costs scale with every component you add, and the billing model is designed so that the real costs only become visible after you have already committed to the architecture.

ComponentWhat the Diagram ShowsWhat the Bill Shows
Inter-region data transferAn arrow between two regionsAWS and Azure charge per GB for data crossing regions or availability zones. A busy microservices system with chatty cross-AZ communication can generate thousands of dollars monthly in egress fees before any external traffic is factored in.
Managed databasesA database cylinderManaged databases (RDS, Cloud SQL) charge separately for compute, storage, IOPS, backup retention, and read replicas. A 'small' database with read replicas and automated backups can cost 3–5× the base compute rate.
Observability toolingNot shown at allDistributed tracing, centralized log aggregation, and metrics storage for a 10-service system can easily cost $500–$2,000/month before you have 1,000 users. The monitoring infrastructure often costs as much as the application infrastructure itself.
LLM API callsNot shown at allAs covered in the Foundations section: 1M DAU × 5 AI requests × $3/1M tokens can reach $675,000/month. This number does not appear on any architecture diagram.
Idle standby capacityNot shownHigh-availability setups require standby replicas and failover nodes that are always running — and always billed — even when serving zero traffic. HA is not free redundancy; it is paid-for redundancy.

2. The Engineering Time Bill#

Every component in the diagram requires engineering time to build, deploy, monitor, debug, and maintain. This time is rarely counted when an architecture is first proposed.

ActivityTypical Hidden Cost
Initial setup of a new service1–3 days for a basic scaffold, CI/CD pipeline, monitoring, and runbook — before any business logic is written
On-call overhead per serviceEngineering teams consistently report several hours of monthly maintenance overhead per infrastructure component — for patching, dependency updates, and alert tuning
Incident response for a new componentEach new component introduces a new failure mode that the on-call engineer must learn to debug. Debugging time scales with the number of services within the blast radius of a failure.
Cross-team coordinationWhen a feature touches services owned by different teams, coordination overhead grows faster than linearly. One team requires no external coordination; four teams can require up to six separate bilateral coordination channels.
Technical debt accumulationStudies consistently find that 10–20% of engineering capacity is consumed by existing technical debt, and the majority of development teams report carrying significant debt. Every new component added today contributes to this burden over time.

3. The Build vs. Buy Cost#

When an AI agent generates an architecture, it tends to produce custom implementations of components that already exist as managed services — because building from scratch is what it does by default. The practical question is always: "Could we use a managed service instead, and what is the actual total cost difference?"

A managed Redis instance (Elasticache, Upstash) costs $20–200/month and provides a cache with automatic failover, no operational overhead, and SLA guarantees. Building and operating your own Redis cluster requires container management, persistent volume configuration, backup policies, failover logic, and on-call coverage for every incident. For most early-stage teams, the managed option is cheaper in total cost — even if its infrastructure line item appears higher.

Before drawing a new box, ask: "What is the total cost of this component — infrastructure, engineering time, and ongoing maintenance — compared to not having it or using a managed alternative?"

Legacy Code: The Constraints Below the Whiteboard#

Every system design discussion at a real company happens in the shadow of code that already exists. The whiteboard is blank; the codebase is not.

The Real Starting Point: Existing Systems Everywhere

Textbook system design starts from a blank slate. Production system design starts from what already exists. Every box you draw on a whiteboard corresponds to either a new component you must build, an existing component you must integrate with, or an existing component you must replace — and replacement is almost always harder and riskier than it looks.

Rendering diagram...

The Shopify data point: Shopify dedicates 25% of its development cycles specifically to reducing technical debt, running formal "debt sprints" as a regular part of its engineering calendar. This is not an accident — it reflects the empirical reality that debt accumulates faster than teams realize, and that only deliberate, scheduled investment keeps it from consuming engineering capacity entirely.

For developers working with AI agents, legacy code introduces a specific risk: even agents that can read your codebase may not understand the full context behind it. Modern coding agents like Claude Code can navigate files, grep for symbols, and trace dependencies — but reading code is not the same as understanding why it exists in its current form. An agent asked to "add authentication" can find your existing auth service, but it may not know that the team deliberately chose session tokens over JWTs for compliance reasons, that the user schema has a frozen column that cannot be altered without a migration committee review, or that three other services depend on the exact shape of the session cookie. The agent can see what exists; it cannot always infer the constraints and decisions that shaped it.

The mitigation is explicit architectural context. When prompting an agent to build anything that touches legacy systems, supplement its ability to read the code with the knowledge it cannot derive from the code alone: why certain patterns were chosen, which schemas are frozen, what cross-team dependencies exist, and which components have implicit contracts that are not documented in the codebase. The agent can read your files — but the organizational and historical constraints that govern how those files can change are yours to provide.

Organizational Friction: Conway's Law in Your Architecture#

In 1967, software researcher Melvin Conway published an observation that has proven more durable than most software frameworks:

"Organizations which design systems are constrained to produce designs which are copies of the communication structures of those organizations."

This is Conway's Law. It means that the architecture of a system tends to reflect the structure of the organization that built it — not always by design, but as a natural consequence of how teams communicate and collaborate.

Conway's Law: Your Org Chart Is Your Architecture Diagram

When teams can communicate freely, they can build tightly integrated systems. When teams are separated by time zones, reporting structures, or budget boundaries, they build loosely coupled systems with well-defined interfaces — because that is the only way to ship independently. The system architecture tends to mirror the communication paths that actually exist in the organization, whether that was intended or not.

Rendering diagram...

A concrete trace: If you inherit a codebase where the payments and authentication systems share a database, and neither team can change the schema without a committee review, you are not looking at a technical problem. You are looking at an organizational one — two teams who once shared ownership of a system and now cannot disentangle it without coordinating across structures that were built for a different purpose. The architecture is an archaeological record of how the organization used to communicate.

Conway's Law also predicts a failure mode specific to AI-assisted development. An AI agent asked to "add a notification service" will generate a technically sound, independent service. It will not know that notifications are owned by the platform team, require a security review, depend on a specific internal SDK, and must go through a specific on-call rotation. The organizational constraints that shape how a service can be built and operated are completely invisible to the agent.

The mitigation is to make those constraints explicit before generating code. "This service must integrate with the platform team's notification SDK, cannot require direct database access, and must use the standard internal auth middleware" is context the agent cannot derive on its own — but it makes the difference between generated code that ships and code that stalls in review.

"Just Add a Cache" Isn't Free#

Caching is one of the most commonly proposed performance solutions in system design — and one of the most commonly misunderstood in production. A cache appears on a diagram as a single box. In practice, it is a new distributed subsystem with its own failure modes, consistency implications, and operational overhead.

What a Cache Actually Adds to Your System

Adding a cache looks simple on a diagram: put a box between the service and the database. In production, that box introduces cache invalidation logic, TTL management, stampede risk, memory pressure, and a new consistency boundary. Each of these is a new failure mode that did not exist before.

Rendering diagram...

The "just add a cache" decision framework. Before adding a cache, the decision must pass a practical test: the expected savings in database load and latency must clearly outweigh the added costs of cache infrastructure, invalidation logic, and the risk of serving stale data. For a query that takes 5ms on a modern database, returns user-specific data that changes every few minutes, and is called infrequently, the overhead of checking the cache, managing TTLs, and handling cold misses can easily exceed the savings entirely. The benefit has to be real and measurable, not assumed.

What AI agents still get wrong here: Modern coding agents like Claude Code can read your query logic, check response times in logs, and even run profiling commands — but they still default to proposing a cache when asked to "make this faster," because caching is the most well-represented performance pattern in their training data. A capable agent may inspect the code before suggesting a solution, but it cannot measure production latency under realistic load, observe actual database contention patterns, or know that your team already tried Redis and abandoned it due to invalidation bugs. Without that operational context, even a well-informed agent will reach for the familiar pattern rather than confirming the bottleneck first.

The correct order: measure first, cache second. Run the operation without a cache under realistic load. Identify the actual bottleneck. Only if the database query is confirmed to be the bottleneck should you design the caching layer — including TTLs, invalidation triggers, and a stampede mitigation strategy.

Putting It Together: What to Ask Before Drawing a Box#

The habits that make diagram-to-production transitions successful come down to asking the questions that the diagram does not capture.

What the Diagram ShowsThe Question It Doesn't Answer
A new service boxWho owns this service? What team is on-call for it? Does this organization have the capacity to operate another service?
An arrow between two servicesWhat is the latency budget for this call? What happens if this call fails after the caller's side has already committed a state change?
A cache boxIs the database actually the bottleneck? What is the TTL strategy? What happens when the cache is empty (cold start) or when a popular entry expires simultaneously for thousands of users?
A new databaseDoes a database already exist that could serve this use case? Who approves schema changes? What is the backup and recovery strategy?
A message queueIs asynchronous processing actually needed here, or is a synchronous call with a retry strategy sufficient? What happens to messages that repeatedly fail — where do they go and who handles them?
A third-party serviceWhat is this service's SLA? What happens to our system when it is slow or unavailable? What are the egress costs for traffic to and from it?
An 'AI service' boxWhat is the token budget per request? What is the monthly cost at target scale? What is the fallback when the AI service is unavailable or returns an unexpected response?

These questions are not obstacles — they are the design work. Answering them before writing code is what separates architectures that survive production from architectures that look good on a slide.

For AI-assisted development specifically: Modern coding agents can read your codebase, discover existing services, and adapt their output to match your project's patterns — but they still have no visibility into your budget, your team structure, your on-call capacity, or the approval processes that govern what can actually ship. An agent like Claude Code will find your existing database schema before proposing a new one, but it cannot know that your infrastructure budget is frozen this quarter, that the platform team requires a two-week review for new services, or that your on-call rotation is already stretched thin. The operational questions in the table above are exactly the ones that live outside the codebase — and therefore outside what any code-aware agent can answer on its own.

A practical prompt pattern: before asking an agent to implement a new component, prepend: "Before implementing, list the operational assumptions you are making about [component] — specifically: who owns it, what it costs to run, how it is monitored, and what happens if it fails." The agent can ground some of these answers in your existing code (existing monitoring patterns, similar services it can find), but the organizational and financial constraints are yours to validate. Making assumptions explicit turns the agent's output from a finished proposal into a starting point for the harder conversation.

Summary#

What Diagrams MissWhy It Matters in Production
Network failure modesEvery arrow has eight independent failure points. Partial failures — where the server wrote successfully but the client never received the response — require idempotency, not just retries.
Infrastructure costsEgress fees, managed service tiers, observability tooling, and idle HA capacity all follow from architectural decisions but never appear on the diagram. Estimate costs before committing to components.
Engineering time and maintenanceEach new component adds monthly maintenance hours, on-call burden, and incident response complexity. The true cost of a component is not just its compute bill — it is compute plus the engineering time required to run it.
Legacy code and constraintsEvery real design is constrained by existing systems. AI agents have no awareness of your codebase. Provide explicit context — schemas, API contracts, existing patterns — before prompting for new components.
Conway's LawYour architecture will tend to mirror your org structure whether you intend it to or not. Service ownership, approval processes, and team communication channels shape what is actually buildable, regardless of what the diagram shows.
Cache complexityA cache is not a performance checkbox — it is a consistency boundary, an invalidation problem, a stampede risk, and a new operational dependency. Measure first; cache only when the database is confirmed as the bottleneck.
Organizational frictionThe hardest part of shipping a new component is often not building it — it is getting approval, aligning with dependent teams, completing security review, and arranging a deployment window.
AI agent blindnessAI agents generate components without knowing your budget, your legacy systems, your team structure, or your operational capacity. The questions diagrams miss are exactly the questions agents cannot answer — and exactly the questions you must ask.

A diagram is not flawed because it omits these things — it would be unreadable if it tried to capture all of them. But a diagram becomes dangerous when it is treated as a production plan rather than a starting point for a harder conversation: "Given this ideal structure, what does our actual budget, codebase, and organization allow us to build?"

The engineers who bridge that gap reliably are not the ones who draw the best diagrams. They are the ones who know which questions the diagram cannot answer — and who ask those questions before writing a single line of code.

Sources: