Foundations: System Design in the Age of AI Coding

System design is the practice of deciding how a system's components — servers, databases, APIs, services — are organized and interact to meet both functional and non-functional requirements. It's the blueprint before the bricks.

AI coding tools — Claude Code, Cursor, GitHub Copilot, and their successors — can generate entire features in seconds. But speed is not the same as quality. The faster AI writes code, the more important it becomes for you to understand whether that code will hold up in production.

This is the central insight behind agentic engineering: system design is no longer primarily about writing code — it's about architecting systems, guiding AI agents with constraints, and knowing how to evaluate what they produce.

From Vibe Coding to Agentic Engineering#

In early 2025, Andrej Karpathy coined "vibe coding" — describing the practice of prompting AI to generate code and shipping whatever it produces. The results could be impressive demos. The production reality was often a different story.

By 2026, the industry had coined a successor: agentic engineering. The distinction matters:

	Vibe Coding	Agentic Engineering
Mindset	Prompt → accept output → ship	Specify → generate → review → validate → ship
System design	Left to the AI	Defined by the engineer upfront
Output quality	Works in demo, may fail at scale	Designed for production from the start
AI role	Author	Implementer under architectural constraints
Engineer role	Prompter	Architect and reviewer

The most important shift: engineering skill moves from writing code to designing systems and reviewing AI output. You don't need to type every line — but you absolutely need to know what good architecture looks like in order to recognize when the AI produces something bad.

What AI Agents Get Wrong#

Research consistently shows AI-generated code has predictable failure modes. Understanding these is your first line of defense.

The Four Failure Modes of AI-Generated Code

Studies show nearly half of AI-generated code contains vulnerabilities. The failure patterns are consistent — knowing them lets you review AI output systematically.

Rendering diagram...

A note on N+1 queries: This is one of the most common AI-generated performance bugs. It happens when code fetches a list of items (query 1), then loops over each item and fires a separate database query for each one (N more queries). For 100 items, that's 101 database round-trips instead of 1. AI agents generate this pattern naturally because it's intuitive to read and correct in isolation — but catastrophic at scale. The fix is to use a single JOIN query or a batch fetch that loads all related records in one round-trip, then associates them in application memory.

Non-Functional Requirements: Your Review Checklist#

When AI generates code, it focuses on functional requirements — making the feature work. Non-functional requirements (NFRs) describe how well the system works: how fast, how reliably, how securely. They are what separate a demo from a production system, and AI agents consistently underspecify them unless you explicitly ask.

Property	What It Means	What AI Often Misses
Scalability	System handles 10× more load by adding resources	N+1 DB queries, in-memory state that breaks on multiple instances, missing pagination on list endpoints
Performance	Fast response under normal load conditions	Missing database indexes, synchronous blocking calls where async would work, no caching strategy
Reliability	Keeps working when parts fail	No retry logic, no circuit breakers, unhandled edge cases that crash the process
Security	Resistant to attacks and data leaks	Missing input validation, hardcoded secrets, Insecure Direct Object References (IDOR), overly verbose error messages that expose internals
Maintainability	Codebase stays understandable as it grows	Duplicated logic, inconsistent naming, no separation of concerns between layers

IDOR (Insecure Direct Object Reference) is worth calling out specifically: it's a vulnerability where an API exposes a resource directly by its ID (e.g., /api/orders/42) without checking whether the requesting user has permission to access it. AI agents routinely generate the data access logic but omit the authorization check. The fix is always a permission check before returning data: verify that the authenticated user owns or has rights to the requested resource — never trust that a valid ID is sufficient authorization.

The practical workflow: after AI generates code, run through this checklist before accepting it. Ask the AI to review its own output against each property. It will often find its own mistakes when prompted specifically.

Latency vs. Throughput — Know What You're Asking For#

AI agents don't inherently optimize for latency or throughput — they optimize for correctness. If performance matters, you need to specify which metric you care about and why.

Latency is how long a single request takes end-to-end. Throughput is how many requests the system handles per second across all users. Optimizing for one often comes at the cost of the other: dedicating resources to serve one request immediately reduces the capacity available for concurrent requests.

Rendering diagram...

As a rule of thumb: interactive user-facing features (dashboards, search, checkout) prioritize low latency — users feel delays. Background jobs (report generation, email delivery, data exports) prioritize high throughput — processing more items per minute matters more than how fast any single one finishes.

When prompting an AI agent to build a feature, explicitly state your constraint: "optimize for low latency for interactive user requests" or "optimize for throughput for batch processing jobs." Without this, the AI will make an arbitrary choice.

Spec-Driven Development: Constraining the Agent#

The most effective way to get production-quality code from AI agents is to define the architecture before you prompt. This is called spec-driven development — your specification becomes a contract between you and the agent. The agent implements to your spec; your spec enforces the architectural decisions you've already made.

Rendering diagram...

A good spec answers these questions before the AI writes a single line:

What does this component do, and what is explicitly outside its scope?
What are the expected inputs and outputs (types, constraints, error cases)?
What are the performance requirements (latency budget, expected load)?
What existing patterns must this follow (naming conventions, error handling style, data access layer)?
What must never happen (security invariants, data integrity rules)?

The CLAUDE.md pattern is a real-world example: project-level instruction files that persist architectural constraints across every agent session. Without such a document, each new session starts fresh — the agent has no memory of your design decisions and will reinvent them inconsistently across sessions.

Structuring Your Codebase for Agents#

How your repository is organized directly affects how well AI agents can work with it. This is a system design consideration that has no purely code-level equivalent.

Codebase Architecture for Agentic Workflows

AI agents work within context windows — the amount of text they can read and reason about at once. A well-structured codebase lets agents understand and modify self-contained modules without needing to load the entire system into context.

Rendering diagram...

Fast Feedback Loops: How Agents Self-Correct#

The single biggest multiplier on agentic coding quality is the speed of your feedback loop. Agents self-correct when they can observe the results of their actions quickly. Without fast feedback, an agent can produce a chain of plausible-looking changes that collectively break the system.

Feedback Mechanism	What It Catches	Without It
Automated tests	Regressions, broken contracts, edge cases	Agent ships code that passes the prompt but breaks existing behavior
Type checking	Interface mismatches, missing fields, wrong types	Subtle integration bugs only visible at runtime — often in production
Linting / style rules	Inconsistent patterns, code style drift	Codebase diverges into multiple incompatible styles across sessions
Fast build	Compilation errors, missing imports	Agents iterate blindly without knowing whether the code runs at all
Observability	Runtime errors, slow queries, unexpected behavior	Production issues can't be traced back to the agent change that caused them

Think of tests and linting not just as quality tools, but as guardrails for agents. When you ask an AI to implement a feature, point it at the test suite. An agent that can run npm test after each change catches its own mistakes before you do — without requiring you to manually review every diff. Pre-commit hooks that run type-checking and linting automatically are particularly effective: they make it impossible to accidentally commit code that breaks the type contract, regardless of which agent or session produced it.

The CAP Theorem Still Applies#

AI agents will make database choices for you — and they'll default to whatever is most familiar from their training data (usually PostgreSQL for everything). But the CAP theorem is a fundamental constraint of distributed systems that no amount of code generation can bypass.

The CAP Theorem

In any distributed system, you can only fully guarantee two of three properties: Consistency, Availability, and Partition Tolerance. In practice, network partitions are unavoidable in distributed systems — so the real trade-off is between Consistency and Availability when a partition occurs. Traditional single-master databases like PostgreSQL sit in the CA category: consistent and available under normal conditions, but not designed to operate across a network partition. AI agents default to PostgreSQL without considering whether your use case demands a different trade-off.

Rendering diagram...

Practical note on CAP: The three letters stand for Consistency (every read returns the most recent write), Availability (every request receives a response, even during failures), and Partition Tolerance (the system continues operating when network communication between nodes fails). Because network partitions are unavoidable in real distributed systems, the practical choice is what your system does when a partition occurs: does it stay available and risk returning stale data, or does it stay consistent and refuse to respond until the partition heals?

One important nuance: traditional databases like PostgreSQL are classified as CA because they are designed for single-master deployments where partitions are not expected. When you run PostgreSQL with high-availability replication (e.g., Patroni), the cluster behaves as CP — it prioritizes consistency during failover and may briefly refuse writes. Natively distributed databases like Cassandra and DynamoDB are designed from the ground up as AP: they accept reads and writes across partitions, resolving conflicts afterward. Choose your database based on which guarantee your feature cannot afford to lose.

The key question to ask before accepting any AI-generated data layer: "What happens to this feature when the database is slow, unavailable, or returns stale data?" If the AI hasn't handled these cases, the system is not production-ready.

Back-of-the-Envelope: Validate AI Assumptions#

AI agents don't estimate scale — they generate code for the requirements you give them. If you don't specify scale, the agent generates code optimized for one user. Back-of-the-envelope estimation is how you catch this before it reaches production.

The core estimation chain:

Step	Calculation	Result
Start with daily active users (DAU)	given: 1M DAU	1,000,000 users/day
Estimate requests per user per day	1M × 20 requests	20M requests/day
Convert to average RPS	20M ÷ 86,400 seconds	≈ 230 RPS average
Design for peak (3× average)	230 × 3	≈ 700 RPS peak
Size your instances	700 RPS ÷ ~200 RPS/instance	≈ 4 instances minimum

An instance here means one running copy of your application server — a single virtual machine or container handling incoming requests. The ~200 RPS per instance is a rough estimate; the actual number depends on your app's CPU and I/O profile, but 100–300 RPS is a reasonable starting range for a typical API server during estimation.

Useful constants to have ready:

1 day ≈ 86,400 seconds
1 month ≈ 2.5M seconds
Peak traffic is typically 2–5× the daily average

AI token cost estimation follows the same logic. If your application calls an LLM API, token usage compounds quickly at scale:

Step	Calculation	Result
Tokens per AI request (input + output)	~1,000 input + ~500 output	1,500 tokens/request
AI requests per user per day	1M DAU × 5 AI requests	5M AI requests/day
Daily token usage	5M × 1,500 tokens	7.5B tokens/day
Monthly token usage	7.5B × 30 days	225B tokens/month
Cost at $3 per 1M tokens	225B ÷ 1M × $3	≈ $675,000/month

This is why prompt design, output length, and caching are engineering decisions, not afterthoughts. Reducing average output from 500 to 300 tokens cuts that estimate by roughly $180,000/month. Semantic caching — returning a stored response for semantically similar queries — can reduce LLM calls by 20–40% in applications with repetitive patterns, such as customer support bots or FAQ assistants. AI agents don't estimate these costs for you. You have to build the estimation habit before committing to a design.

When AI generates a solution that stores data in memory, runs synchronous jobs, or makes one database call per item in a list, run the math. A solution that works at 10 users may require a full redesign at 100,000.

Summary#

Concept	Why It Matters for AI Coding
Vibe coding vs. agentic engineering	Speed ≠ quality; the engineer's job shifts to architecture and review
Four AI failure modes	Scalability, security, technical debt, monolithic tendencies — review against these every time
Non-functional requirements	AI ignores them unless you specify them; define them in your spec before prompting
Spec-driven development	Define data model, interfaces, and constraints before prompting — this is the agent's contract
Codebase structure	Modular files under ~500 lines, explicit types, instruction files — designed for agent context windows
Fast feedback loops	Tests and type checking are agent guardrails, not just quality tools
CAP theorem	AI defaults to PostgreSQL; you decide the actual consistency trade-off for each feature based on whether the data can tolerate staleness
Back-of-the-envelope math	Validate AI assumptions against real scale — both RPS and AI token costs — before any design decision is locked in

The engineers who get the most out of AI coding tools are not the ones who prompt the most fluently — they're the ones who understand systems deeply enough to catch what the AI gets wrong, and to design the constraints that make the AI get it right in the first place.

Sources:

PreviousSystem Design Overview

NextApplication Layer

Foundations: System Design in the Age of AI Coding

The Four Failure Modes of AI-Generated Code

Codebase Architecture for Agentic Workflows

The CAP Theorem

Arch Advisor