Foundations: System Design in the Age of AI Coding

System design is the practice of deciding how a system's components — servers, databases, APIs, services — are organized and interact to meet both functional and non-functional requirements. It's the blueprint before the bricks.

AI coding tools — Claude Code, Cursor, GitHub Copilot, and their successors — can generate entire features in seconds. But speed is not the same as quality. The faster AI writes code, the more important it becomes for you to understand whether that code will hold up in production.

This is the central insight behind agentic engineering: system design is no longer primarily about writing code — it's about architecting systems, guiding AI agents with constraints, and knowing how to evaluate what they produce.

From Vibe Coding to Agentic Engineering#

In early 2025, Andrej Karpathy coined "vibe coding" — describing the practice of prompting AI to generate code and shipping whatever it produces. The results could be impressive demos. The production reality was often a different story.

By 2026, the industry had coined a successor: agentic engineering. The distinction matters:

Vibe CodingAgentic Engineering
MindsetPrompt → accept output → shipSpecify → generate → review → validate → ship
System designLeft to the AIDefined by the engineer upfront
Output qualityWorks in demo, may fail at scaleDesigned for production from the start
AI roleAuthorImplementer under architectural constraints
Engineer rolePrompterArchitect and reviewer

The most important shift: engineering skill moves from writing code to designing systems and reviewing AI output. You don't need to type every line — but you absolutely need to know what good architecture looks like in order to recognize when the AI produces something bad.

What AI Agents Get Wrong#

Research consistently shows AI-generated code has predictable failure modes. Understanding these is your first line of defense.

The Four Failure Modes of AI-Generated Code

Studies show nearly half of AI-generated code contains vulnerabilities. The failure patterns are consistent — knowing them lets you review AI output systematically.

Rendering diagram...

A note on N+1 queries: This is one of the most common AI-generated performance bugs. It happens when code fetches a list of items (query 1), then loops over each item and fires a separate database query for each one (N more queries). For 100 items, that's 101 database round-trips instead of 1. AI agents generate this pattern naturally because it's intuitive to read and correct in isolation — but catastrophic at scale. The fix is to use a single JOIN query or a batch fetch that loads all related records in one round-trip, then associates them in application memory.

Non-Functional Requirements: Your Review Checklist#

When AI generates code, it focuses on functional requirements — making the feature work. Non-functional requirements (NFRs) describe how well the system works: how fast, how reliably, how securely. They are what separate a demo from a production system, and AI agents consistently underspecify them unless you explicitly ask.

PropertyWhat It MeansWhat AI Often Misses
ScalabilitySystem handles 10× more load by adding resourcesN+1 DB queries, in-memory state that breaks on multiple instances, missing pagination on list endpoints
PerformanceFast response under normal load conditionsMissing database indexes, synchronous blocking calls where async would work, no caching strategy
ReliabilityKeeps working when parts failNo retry logic, no circuit breakers, unhandled edge cases that crash the process
SecurityResistant to attacks and data leaksMissing input validation, hardcoded secrets, Insecure Direct Object References (IDOR), overly verbose error messages that expose internals
MaintainabilityCodebase stays understandable as it growsDuplicated logic, inconsistent naming, no separation of concerns between layers

IDOR (Insecure Direct Object Reference) is worth calling out specifically: it's a vulnerability where an API exposes a resource directly by its ID (e.g., /api/orders/42) without checking whether the requesting user has permission to access it. AI agents routinely generate the data access logic but omit the authorization check. The fix is always a permission check before returning data: verify that the authenticated user owns or has rights to the requested resource — never trust that a valid ID is sufficient authorization.

The practical workflow: after AI generates code, run through this checklist before accepting it. Ask the AI to review its own output against each property. It will often find its own mistakes when prompted specifically.

Latency vs. Throughput — Know What You're Asking For#

AI agents don't inherently optimize for latency or throughput — they optimize for correctness. If performance matters, you need to specify which metric you care about and why.

Latency is how long a single request takes end-to-end. Throughput is how many requests the system handles per second across all users. Optimizing for one often comes at the cost of the other: dedicating resources to serve one request immediately reduces the capacity available for concurrent requests.

Rendering diagram...
Rendering diagram...

As a rule of thumb: interactive user-facing features (dashboards, search, checkout) prioritize low latency — users feel delays. Background jobs (report generation, email delivery, data exports) prioritize high throughput — processing more items per minute matters more than how fast any single one finishes.

When prompting an AI agent to build a feature, explicitly state your constraint: "optimize for low latency for interactive user requests" or "optimize for throughput for batch processing jobs." Without this, the AI will make an arbitrary choice.

Spec-Driven Development: Constraining the Agent#

The most effective way to get production-quality code from AI agents is to define the architecture before you prompt. This is called spec-driven development — your specification becomes a contract between you and the agent. The agent implements to your spec; your spec enforces the architectural decisions you've already made.

Rendering diagram...

A good spec answers these questions before the AI writes a single line:

  • What does this component do, and what is explicitly outside its scope?
  • What are the expected inputs and outputs (types, constraints, error cases)?
  • What are the performance requirements (latency budget, expected load)?
  • What existing patterns must this follow (naming conventions, error handling style, data access layer)?
  • What must never happen (security invariants, data integrity rules)?

The CLAUDE.md pattern is a real-world example: project-level instruction files that persist architectural constraints across every agent session. Without such a document, each new session starts fresh — the agent has no memory of your design decisions and will reinvent them inconsistently across sessions.

Structuring Your Codebase for Agents#

How your repository is organized directly affects how well AI agents can work with it. This is a system design consideration that has no purely code-level equivalent.

Codebase Architecture for Agentic Workflows

AI agents work within context windows — the amount of text they can read and reason about at once. A well-structured codebase lets agents understand and modify self-contained modules without needing to load the entire system into context.

Rendering diagram...

Fast Feedback Loops: How Agents Self-Correct#

The single biggest multiplier on agentic coding quality is the speed of your feedback loop. Agents self-correct when they can observe the results of their actions quickly. Without fast feedback, an agent can produce a chain of plausible-looking changes that collectively break the system.

Feedback MechanismWhat It CatchesWithout It
Automated testsRegressions, broken contracts, edge casesAgent ships code that passes the prompt but breaks existing behavior
Type checkingInterface mismatches, missing fields, wrong typesSubtle integration bugs only visible at runtime — often in production
Linting / style rulesInconsistent patterns, code style driftCodebase diverges into multiple incompatible styles across sessions
Fast buildCompilation errors, missing importsAgents iterate blindly without knowing whether the code runs at all
ObservabilityRuntime errors, slow queries, unexpected behaviorProduction issues can't be traced back to the agent change that caused them

Think of tests and linting not just as quality tools, but as guardrails for agents. When you ask an AI to implement a feature, point it at the test suite. An agent that can run npm test after each change catches its own mistakes before you do — without requiring you to manually review every diff. Pre-commit hooks that run type-checking and linting automatically are particularly effective: they make it impossible to accidentally commit code that breaks the type contract, regardless of which agent or session produced it.

The CAP Theorem Still Applies#

AI agents will make database choices for you — and they'll default to whatever is most familiar from their training data (usually PostgreSQL for everything). But the CAP theorem is a fundamental constraint of distributed systems that no amount of code generation can bypass.

The CAP Theorem

In any distributed system, you can only fully guarantee two of three properties: Consistency, Availability, and Partition Tolerance. In practice, network partitions are unavoidable in distributed systems — so the real trade-off is between Consistency and Availability when a partition occurs. Traditional single-master databases like PostgreSQL sit in the CA category: consistent and available under normal conditions, but not designed to operate across a network partition. AI agents default to PostgreSQL without considering whether your use case demands a different trade-off.

Rendering diagram...

Practical note on CAP: The three letters stand for Consistency (every read returns the most recent write), Availability (every request receives a response, even during failures), and Partition Tolerance (the system continues operating when network communication between nodes fails). Because network partitions are unavoidable in real distributed systems, the practical choice is what your system does when a partition occurs: does it stay available and risk returning stale data, or does it stay consistent and refuse to respond until the partition heals?

One important nuance: traditional databases like PostgreSQL are classified as CA because they are designed for single-master deployments where partitions are not expected. When you run PostgreSQL with high-availability replication (e.g., Patroni), the cluster behaves as CP — it prioritizes consistency during failover and may briefly refuse writes. Natively distributed databases like Cassandra and DynamoDB are designed from the ground up as AP: they accept reads and writes across partitions, resolving conflicts afterward. Choose your database based on which guarantee your feature cannot afford to lose.

The key question to ask before accepting any AI-generated data layer: "What happens to this feature when the database is slow, unavailable, or returns stale data?" If the AI hasn't handled these cases, the system is not production-ready.

Back-of-the-Envelope: Validate AI Assumptions#

AI agents don't estimate scale — they generate code for the requirements you give them. If you don't specify scale, the agent generates code optimized for one user. Back-of-the-envelope estimation is how you catch this before it reaches production.

The core estimation chain:

StepCalculationResult
Start with daily active users (DAU)given: 1M DAU1,000,000 users/day
Estimate requests per user per day1M × 20 requests20M requests/day
Convert to average RPS20M ÷ 86,400 seconds≈ 230 RPS average
Design for peak (3× average)230 × 3≈ 700 RPS peak
Size your instances700 RPS ÷ ~200 RPS/instance≈ 4 instances minimum

An instance here means one running copy of your application server — a single virtual machine or container handling incoming requests. The ~200 RPS per instance is a rough estimate; the actual number depends on your app's CPU and I/O profile, but 100–300 RPS is a reasonable starting range for a typical API server during estimation.

Useful constants to have ready:

  • 1 day ≈ 86,400 seconds
  • 1 month ≈ 2.5M seconds
  • Peak traffic is typically 2–5× the daily average

AI token cost estimation follows the same logic. If your application calls an LLM API, token usage compounds quickly at scale:

StepCalculationResult
Tokens per AI request (input + output)~1,000 input + ~500 output1,500 tokens/request
AI requests per user per day1M DAU × 5 AI requests5M AI requests/day
Daily token usage5M × 1,500 tokens7.5B tokens/day
Monthly token usage7.5B × 30 days225B tokens/month
Cost at $3 per 1M tokens225B ÷ 1M × $3≈ $675,000/month

This is why prompt design, output length, and caching are engineering decisions, not afterthoughts. Reducing average output from 500 to 300 tokens cuts that estimate by roughly $180,000/month. Semantic caching — returning a stored response for semantically similar queries — can reduce LLM calls by 20–40% in applications with repetitive patterns, such as customer support bots or FAQ assistants. AI agents don't estimate these costs for you. You have to build the estimation habit before committing to a design.

When AI generates a solution that stores data in memory, runs synchronous jobs, or makes one database call per item in a list, run the math. A solution that works at 10 users may require a full redesign at 100,000.

Summary#

ConceptWhy It Matters for AI Coding
Vibe coding vs. agentic engineeringSpeed ≠ quality; the engineer's job shifts to architecture and review
Four AI failure modesScalability, security, technical debt, monolithic tendencies — review against these every time
Non-functional requirementsAI ignores them unless you specify them; define them in your spec before prompting
Spec-driven developmentDefine data model, interfaces, and constraints before prompting — this is the agent's contract
Codebase structureModular files under ~500 lines, explicit types, instruction files — designed for agent context windows
Fast feedback loopsTests and type checking are agent guardrails, not just quality tools
CAP theoremAI defaults to PostgreSQL; you decide the actual consistency trade-off for each feature based on whether the data can tolerate staleness
Back-of-the-envelope mathValidate AI assumptions against real scale — both RPS and AI token costs — before any design decision is locked in

The engineers who get the most out of AI coding tools are not the ones who prompt the most fluently — they're the ones who understand systems deeply enough to catch what the AI gets wrong, and to design the constraints that make the AI get it right in the first place.

Sources: