Foundations: System Design in the Age of AI Coding
System design is the practice of deciding how a system's components — servers, databases, APIs, services — are organized and interact to meet both functional and non-functional requirements. It's the blueprint before the bricks.
AI coding tools — Claude Code, Cursor, GitHub Copilot, and their successors — can generate entire features in seconds. But speed is not the same as quality. The faster AI writes code, the more important it becomes for you to understand whether that code will hold up in production.
This is the central insight behind agentic engineering: system design is no longer primarily about writing code — it's about architecting systems, guiding AI agents with constraints, and knowing how to evaluate what they produce.
From Vibe Coding to Agentic Engineering#
In early 2025, Andrej Karpathy coined "vibe coding" — describing the practice of prompting AI to generate code and shipping whatever it produces. The results could be impressive demos. The production reality was often a different story.
By 2026, the industry had coined a successor: agentic engineering. The distinction matters:
| Vibe Coding | Agentic Engineering | |
|---|---|---|
| Mindset | Prompt → accept output → ship | Specify → generate → review → validate → ship |
| System design | Left to the AI | Defined by the engineer upfront |
| Output quality | Works in demo, may fail at scale | Designed for production from the start |
| AI role | Author | Implementer under architectural constraints |
| Engineer role | Prompter | Architect and reviewer |
The most important shift: engineering skill moves from writing code to designing systems and reviewing AI output. You don't need to type every line — but you absolutely need to know what good architecture looks like in order to recognize when the AI produces something bad.
What AI Agents Get Wrong#
Research consistently shows AI-generated code has predictable failure modes. Understanding these is your first line of defense.
The Four Failure Modes of AI-Generated Code
Studies show nearly half of AI-generated code contains vulnerabilities. The failure patterns are consistent — knowing them lets you review AI output systematically.
A note on N+1 queries: This is one of the most common AI-generated performance bugs. It happens when code fetches a list of items (query 1), then loops over each item and fires a separate database query for each one (N more queries). For 100 items, that's 101 database round-trips instead of 1. AI agents generate this pattern naturally because it's intuitive to read and correct in isolation — but catastrophic at scale. The fix is to use a single JOIN query or a batch fetch that loads all related records in one round-trip, then associates them in application memory.
Non-Functional Requirements: Your Review Checklist#
When AI generates code, it focuses on functional requirements — making the feature work. Non-functional requirements (NFRs) describe how well the system works: how fast, how reliably, how securely. They are what separate a demo from a production system, and AI agents consistently underspecify them unless you explicitly ask.
| Property | What It Means | What AI Often Misses |
|---|---|---|
| Scalability | System handles 10× more load by adding resources | N+1 DB queries, in-memory state that breaks on multiple instances, missing pagination on list endpoints |
| Performance | Fast response under normal load conditions | Missing database indexes, synchronous blocking calls where async would work, no caching strategy |
| Reliability | Keeps working when parts fail | No retry logic, no circuit breakers, unhandled edge cases that crash the process |
| Security | Resistant to attacks and data leaks | Missing input validation, hardcoded secrets, Insecure Direct Object References (IDOR), overly verbose error messages that expose internals |
| Maintainability | Codebase stays understandable as it grows | Duplicated logic, inconsistent naming, no separation of concerns between layers |
IDOR (Insecure Direct Object Reference) is worth calling out specifically: it's a vulnerability where an API exposes a resource directly by its ID (e.g., /api/orders/42) without checking whether the requesting user has permission to access it. AI agents routinely generate the data access logic but omit the authorization check. The fix is always a permission check before returning data: verify that the authenticated user owns or has rights to the requested resource — never trust that a valid ID is sufficient authorization.
The practical workflow: after AI generates code, run through this checklist before accepting it. Ask the AI to review its own output against each property. It will often find its own mistakes when prompted specifically.
Latency vs. Throughput — Know What You're Asking For#
AI agents don't inherently optimize for latency or throughput — they optimize for correctness. If performance matters, you need to specify which metric you care about and why.
Latency is how long a single request takes end-to-end. Throughput is how many requests the system handles per second across all users. Optimizing for one often comes at the cost of the other: dedicating resources to serve one request immediately reduces the capacity available for concurrent requests.
As a rule of thumb: interactive user-facing features (dashboards, search, checkout) prioritize low latency — users feel delays. Background jobs (report generation, email delivery, data exports) prioritize high throughput — processing more items per minute matters more than how fast any single one finishes.
When prompting an AI agent to build a feature, explicitly state your constraint: "optimize for low latency for interactive user requests" or "optimize for throughput for batch processing jobs." Without this, the AI will make an arbitrary choice.
Spec-Driven Development: Constraining the Agent#
The most effective way to get production-quality code from AI agents is to define the architecture before you prompt. This is called spec-driven development — your specification becomes a contract between you and the agent. The agent implements to your spec; your spec enforces the architectural decisions you've already made.
A good spec answers these questions before the AI writes a single line:
- What does this component do, and what is explicitly outside its scope?
- What are the expected inputs and outputs (types, constraints, error cases)?
- What are the performance requirements (latency budget, expected load)?
- What existing patterns must this follow (naming conventions, error handling style, data access layer)?
- What must never happen (security invariants, data integrity rules)?
The CLAUDE.md pattern is a real-world example: project-level instruction files that persist architectural constraints across every agent session. Without such a document, each new session starts fresh — the agent has no memory of your design decisions and will reinvent them inconsistently across sessions.
Structuring Your Codebase for Agents#
How your repository is organized directly affects how well AI agents can work with it. This is a system design consideration that has no purely code-level equivalent.
Codebase Architecture for Agentic Workflows
AI agents work within context windows — the amount of text they can read and reason about at once. A well-structured codebase lets agents understand and modify self-contained modules without needing to load the entire system into context.
Fast Feedback Loops: How Agents Self-Correct#
The single biggest multiplier on agentic coding quality is the speed of your feedback loop. Agents self-correct when they can observe the results of their actions quickly. Without fast feedback, an agent can produce a chain of plausible-looking changes that collectively break the system.
| Feedback Mechanism | What It Catches | Without It |
|---|---|---|
| Automated tests | Regressions, broken contracts, edge cases | Agent ships code that passes the prompt but breaks existing behavior |
| Type checking | Interface mismatches, missing fields, wrong types | Subtle integration bugs only visible at runtime — often in production |
| Linting / style rules | Inconsistent patterns, code style drift | Codebase diverges into multiple incompatible styles across sessions |
| Fast build | Compilation errors, missing imports | Agents iterate blindly without knowing whether the code runs at all |
| Observability | Runtime errors, slow queries, unexpected behavior | Production issues can't be traced back to the agent change that caused them |
Think of tests and linting not just as quality tools, but as guardrails for agents. When you ask an AI to implement a feature, point it at the test suite. An agent that can run npm test after each change catches its own mistakes before you do — without requiring you to manually review every diff. Pre-commit hooks that run type-checking and linting automatically are particularly effective: they make it impossible to accidentally commit code that breaks the type contract, regardless of which agent or session produced it.
The CAP Theorem Still Applies#
AI agents will make database choices for you — and they'll default to whatever is most familiar from their training data (usually PostgreSQL for everything). But the CAP theorem is a fundamental constraint of distributed systems that no amount of code generation can bypass.
The CAP Theorem
In any distributed system, you can only fully guarantee two of three properties: Consistency, Availability, and Partition Tolerance. In practice, network partitions are unavoidable in distributed systems — so the real trade-off is between Consistency and Availability when a partition occurs. Traditional single-master databases like PostgreSQL sit in the CA category: consistent and available under normal conditions, but not designed to operate across a network partition. AI agents default to PostgreSQL without considering whether your use case demands a different trade-off.
Practical note on CAP: The three letters stand for Consistency (every read returns the most recent write), Availability (every request receives a response, even during failures), and Partition Tolerance (the system continues operating when network communication between nodes fails). Because network partitions are unavoidable in real distributed systems, the practical choice is what your system does when a partition occurs: does it stay available and risk returning stale data, or does it stay consistent and refuse to respond until the partition heals?
One important nuance: traditional databases like PostgreSQL are classified as CA because they are designed for single-master deployments where partitions are not expected. When you run PostgreSQL with high-availability replication (e.g., Patroni), the cluster behaves as CP — it prioritizes consistency during failover and may briefly refuse writes. Natively distributed databases like Cassandra and DynamoDB are designed from the ground up as AP: they accept reads and writes across partitions, resolving conflicts afterward. Choose your database based on which guarantee your feature cannot afford to lose.
The key question to ask before accepting any AI-generated data layer: "What happens to this feature when the database is slow, unavailable, or returns stale data?" If the AI hasn't handled these cases, the system is not production-ready.
Back-of-the-Envelope: Validate AI Assumptions#
AI agents don't estimate scale — they generate code for the requirements you give them. If you don't specify scale, the agent generates code optimized for one user. Back-of-the-envelope estimation is how you catch this before it reaches production.
The core estimation chain:
| Step | Calculation | Result |
|---|---|---|
| Start with daily active users (DAU) | given: 1M DAU | 1,000,000 users/day |
| Estimate requests per user per day | 1M × 20 requests | 20M requests/day |
| Convert to average RPS | 20M ÷ 86,400 seconds | ≈ 230 RPS average |
| Design for peak (3× average) | 230 × 3 | ≈ 700 RPS peak |
| Size your instances | 700 RPS ÷ ~200 RPS/instance | ≈ 4 instances minimum |
An instance here means one running copy of your application server — a single virtual machine or container handling incoming requests. The ~200 RPS per instance is a rough estimate; the actual number depends on your app's CPU and I/O profile, but 100–300 RPS is a reasonable starting range for a typical API server during estimation.
Useful constants to have ready:
- 1 day ≈ 86,400 seconds
- 1 month ≈ 2.5M seconds
- Peak traffic is typically 2–5× the daily average
AI token cost estimation follows the same logic. If your application calls an LLM API, token usage compounds quickly at scale:
| Step | Calculation | Result |
|---|---|---|
| Tokens per AI request (input + output) | ~1,000 input + ~500 output | 1,500 tokens/request |
| AI requests per user per day | 1M DAU × 5 AI requests | 5M AI requests/day |
| Daily token usage | 5M × 1,500 tokens | 7.5B tokens/day |
| Monthly token usage | 7.5B × 30 days | 225B tokens/month |
| Cost at $3 per 1M tokens | 225B ÷ 1M × $3 | ≈ $675,000/month |
This is why prompt design, output length, and caching are engineering decisions, not afterthoughts. Reducing average output from 500 to 300 tokens cuts that estimate by roughly $180,000/month. Semantic caching — returning a stored response for semantically similar queries — can reduce LLM calls by 20–40% in applications with repetitive patterns, such as customer support bots or FAQ assistants. AI agents don't estimate these costs for you. You have to build the estimation habit before committing to a design.
When AI generates a solution that stores data in memory, runs synchronous jobs, or makes one database call per item in a list, run the math. A solution that works at 10 users may require a full redesign at 100,000.
Summary#
| Concept | Why It Matters for AI Coding |
|---|---|
| Vibe coding vs. agentic engineering | Speed ≠ quality; the engineer's job shifts to architecture and review |
| Four AI failure modes | Scalability, security, technical debt, monolithic tendencies — review against these every time |
| Non-functional requirements | AI ignores them unless you specify them; define them in your spec before prompting |
| Spec-driven development | Define data model, interfaces, and constraints before prompting — this is the agent's contract |
| Codebase structure | Modular files under ~500 lines, explicit types, instruction files — designed for agent context windows |
| Fast feedback loops | Tests and type checking are agent guardrails, not just quality tools |
| CAP theorem | AI defaults to PostgreSQL; you decide the actual consistency trade-off for each feature based on whether the data can tolerate staleness |
| Back-of-the-envelope math | Validate AI assumptions against real scale — both RPS and AI token costs — before any design decision is locked in |
The engineers who get the most out of AI coding tools are not the ones who prompt the most fluently — they're the ones who understand systems deeply enough to catch what the AI gets wrong, and to design the constraints that make the AI get it right in the first place.
Sources:
- Agentic Engineering: The Complete Guide to AI-First Software Development Beyond Vibe Coding (2026)
- The rise of vibe coding: Why architecture still matters in the age of AI agents
- Vibe coding is not the same as AI-Assisted engineering
- AI Technical Debt: How Vibe Coding Increases TCO
- The Reality of Vibe Coding: AI Agents and the Security Debt Crisis
- What is Agentic Engineering? | IBM