Clarifying Requirements: Ask First, Design Second
There is one mistake that causes more wasted effort in software engineering than any other: starting to design — or worse, starting to code — before you truly understand what you are building and why.
Before drawing a single diagram or writing a single line of code, the most important thing you can do is ask: "What problem are we actually solving?"
This is not a warm-up exercise. It is the foundation of every decision that follows. The architecture you choose should be a direct response to your requirements — and if your requirements are wrong, vague, or missing, your architecture will be wrong too. No amount of clever code can fix a system designed to solve the wrong problem.
Why Requirements Drive Everything#
Every architectural decision you make — which database to use, whether to add a cache, whether to split into microservices — is a trade-off: choosing one benefit often means sacrificing another. And you cannot evaluate a trade-off without knowing what you are optimizing for.
Consider this: "Design a chat application." Without further context, this prompt could describe:
- A real-time customer support widget for a SaaS product, used by thousands of support agents
- A private messaging app for a small team of 20 people
- A live chat system for a gaming platform with millions of concurrent users
These three systems share the same description but require radically different architectures, because they differ fundamentally in scale, concurrency demands, and data requirements. The first might need persistent message history and read receipts. The second might work fine as a simple REST API with no real-time requirement. The third needs WebSocket scaling, message fan-out across data centers, and careful capacity planning.
The requirements determine which world you are in.
Requirements fall into three categories. Functional requirements define what the system does — the features users interact with. Non-functional requirements define how well the system must perform — its speed, reliability, and security. Constraints define the boundaries you must operate within — budget, team size, existing technology, legal compliance. All three feed into the design.
Functional Requirements: Scoping What You Build#
Functional requirements describe user-visible behavior. The core question is: what does the system need to do, and what is explicitly out of scope?
The most important skill here is ruthless scoping. When faced with an open-ended system ("build a social media platform"), the right move is to identify the 3–5 core features that define the minimum viable system and explicitly defer everything else.
Key questions to ask:
- What are the core actions a user can perform?
- What data is created, read, updated, or deleted?
- Are there multiple user roles with different permissions?
- Is the workload primarily reads, writes, or balanced?
- What does success look like for the first version?
Example: "Design Twitter" → Functional scope might be: users can post tweets (280 characters), follow other users, and see a feed of tweets from people they follow. Out of scope for now: direct messages, media uploads, trending topics, advertising.
Defining scope explicitly does two things: it tells you what to design, and it tells you what not to design. Every component you don't add is complexity you don't have to operate, debug, or scale.
Non-Functional Requirements: Quantify or They Are Meaningless#
Non-functional requirements (NFRs) describe the quality attributes of the system. The critical rule is: always quantify them. "The system should be fast" is not a requirement. "p99 read latency under 200ms" is a requirement.
A quick note on latency percentiles: p50 means the median — 50% of requests complete within this time. p99 means 99% of requests complete within this time, so only 1 in 100 requests is slower. Engineers care about p99 because it captures the worst experience real users actually encounter. A system where the average response is fast but the slowest 1% of requests take 5 seconds will still frustrate users.
Without quantification, you cannot make trade-off decisions objectively — you will end up making architectural choices based on gut feeling rather than data. You can't choose between two database designs without knowing whether you need 10ms response times or 500ms is acceptable. You can't decide whether to add a caching layer without knowing the read volume. Vague requirements lead to architectures that seem sound in theory but fail in production.
| NFR | Vague (Useless) | Quantified (Actionable) |
|---|---|---|
| Availability | The system should be highly available | 99.9% uptime = max ~8.7 hours downtime/year |
| Latency | Responses should be fast | p99 read latency under 200ms; p50 under 50ms |
| Throughput | Should handle lots of traffic | Must sustain 10,000 writes/second at peak load |
| Durability | Data should not be lost | Zero data loss; writes acknowledged only after persisted to disk |
| Consistency | Data should be accurate | Reads must reflect the most recent write (strong consistency required) |
| Security | The system should be secure | All PII encrypted at rest (AES-256); GDPR-compliant data residency in EU |
The four NFRs that matter most for most systems, and the questions that quantify them:
Availability: What is the acceptable downtime per year? Is this a life-critical system where even minutes of downtime cause significant harm, or is brief unavailability tolerable?
Latency: What response time do users expect? Interactive features (search, checkout) require fast responses because users wait for the result. Background jobs like sending emails or generating reports can tolerate delays of seconds or minutes. As a general guideline, interactive features typically need p99 latency under 200ms.
Scale: How many users do you have today, how many in 12 months, and what does peak traffic look like relative to average? A steady workload and a spiky one (such as flash sales or viral moments) require very different approaches to capacity.
Consistency: When two users read the same data simultaneously, do they need to see exactly the same value? Strong consistency means every read reflects the most recent write — essential for financial transactions where reading a stale balance is dangerous. Eventual consistency means different users might see slightly different values for a short time, but the system guarantees they will converge to the same value eventually — acceptable for a social media "like" count, where a brief discrepancy of a few seconds causes no real harm. This question drives your database choice more than almost any other requirement.
The Four Non-Functional Properties Every System Must Define
These four properties are the inputs to your architecture's most consequential decisions. Every property has a cost — optimizing for one often makes another harder. Defining them upfront lets you make those trade-offs deliberately instead of by accident.
Scale: The Requirement That Changes Everything#
Scale is the most commonly misunderstood requirement — and the one that causes the most premature architectural complexity.
The single most important thing to understand about scale is this: the architecture that is correct at 100 users is not the architecture that is correct at 100 million users — and building for 100 million users when you have 100 is not ambitious, it is wasteful.
| Scale | Appropriate Architecture | What Over-Engineering Adds |
|---|---|---|
| 1–1,000 users | Single server, single PostgreSQL database, no caching layer, no queue | Redis, CDN, read replicas, message queues — all operational overhead with zero benefit |
| 1,000–100,000 users | Horizontal app scaling, database indexes, basic CDN, maybe one cache layer | Sharding, microservices, distributed tracing — complexity the team cannot operate |
| 100,000–10M users | Read replicas, caching layer, async job queues, database connection pooling | Multi-region deployment, custom consensus protocols — engineering months ahead of actual need |
| 10M+ users | Horizontal sharding, multi-region, specialized storage per workload, CDN at edge | This architecture is justified only here — and most systems never reach this scale |
The questions that define your scale requirement:
- Current state: How many users does the system serve today?
- Growth trajectory: What is the realistic user count in 6 and 12 months?
- Traffic pattern: Is load roughly constant, or does it spike (e.g., flash sales, event-driven)?
- Read/write ratio: Is the workload mostly reads (news feed), mostly writes (logging), or balanced (social posting)?
- Data volume: How many records today, and how fast does that grow?
Once you have these numbers, you can do a simple back-of-the-envelope calculation to sanity-check your architecture:
| Step | Calculation | Example Result |
|---|---|---|
| Start with Daily Active Users (DAU) | Given: 50,000 DAU | 50,000 users/day |
| Estimate requests per user per day | 50,000 × 30 requests | 1.5M requests/day |
| Convert to average requests per second (RPS) | 1.5M ÷ 86,400 seconds/day | ≈ 17 RPS average |
| Design for peak traffic (3–5× average) | 17 × 4 | ≈ 68 RPS peak |
| Size application instances | 68 RPS ÷ ~200 RPS/instance | 1 instance is sufficient with headroom |
At 68 peak RPS, a single application instance handles the load comfortably. A single-instance system has zero distributed complexity — no need to manage multiple servers, coordinate between them, or debug the class of failures that only appear across a network. There is no case for load balancing, horizontal scaling, or read replicas at this scale. Adding those components now creates operational overhead — more things to configure, monitor, and debug — with no performance benefit.
The practical rule: Design for roughly 10× your current scale. If you have 1,000 users, design for 10,000. This gives you room to grow without requiring a redesign, but does not force you to build infrastructure for a scale you may never reach.
The Cost of Skipping Requirements: YAGNI#
The YAGNI principle — You Aren't Gonna Need It — is the antidote to over-engineering. It comes from Extreme Programming and states: implement only what is required right now. Do not add components because they might be useful later.
YAGNI is not about being short-sighted. It is about recognizing that software requirements change, and that code written for hypothetical future requirements is often ill-suited for those requirements when they actually arrive — because the future was not what you imagined.
The Over-Engineering Trap
Every component added before it is needed adds operational overhead: it must be deployed, monitored, debugged, and secured. The trap is that each addition seems individually reasonable — it is only when you look at the full picture that the cost becomes clear.
Real-world case studies confirm the cost. In 2018, TSB Bank's IT migration failed catastrophically — customers were locked out of accounts for weeks, costing over £500 million in remediation. Post-mortems identified that new features were added during the migration, requirements were not frozen, and there was no documentation of the architectural design of the platform — meaning there was no way to verify whether the delivered system matched the intended design.
The lesson is not unique to banking. Architecture documentation and frozen requirements are not bureaucratic overhead. They are the mechanism by which you verify the system you built is the system you intended to build.
A Requirements Checklist#
Before designing any system, answer each of these questions. If you cannot answer a question, that is a gap in your requirements — not something to assume.
| Category | Question | Why It Matters |
|---|---|---|
| Functional scope | What are the 3–5 core features? What is explicitly out of scope? | Defines what you are building and prevents scope creep |
| Users | How many users today? What is the realistic 12-month projection? | Drives scale decisions; determines appropriate architecture tier |
| Traffic pattern | Is load constant or spiky? What does peak look like relative to average? | Determines whether you need auto-scaling, queuing, or burst capacity |
| Read/write ratio | Is the workload read-heavy, write-heavy, or balanced? | Drives database choice, caching strategy, and replica topology |
| Availability | What is acceptable downtime per year? | Determines redundancy requirements and failover design |
| Latency | What are the p50 and p99 latency targets for key operations? | Determines whether synchronous or asynchronous patterns are needed |
| Consistency | Can users tolerate briefly stale data, or must every read reflect the latest write? | Drives database selection and replication strategy |
| Data durability | What happens if a write is lost? Is any data loss acceptable? | Determines persistence guarantees and backup strategy |
| Compliance | Are there regulatory requirements (GDPR, HIPAA, PCI-DSS)? | May mandate data residency, encryption, audit logs, or access controls |
| Team & operations | What is the team's operational expertise? What is the sustainable on-call burden? | A technically superior architecture the team cannot operate is a bad architecture |
| Budget | What is the monthly infrastructure budget? | Eliminates architectures that are technically valid but economically infeasible |
Applying Requirements Clarification with AI Coding Agents#
Requirements clarification is more important in the age of AI agents, not less. An AI agent that starts in the wrong direction amplifies that mistake at high speed — generating thousands of lines of code for the wrong architecture before anyone notices. The cost of a wrong direction scales with the speed of the tool.
The effective pattern for working with AI agents is: plan first, code second.
Practical rules for directing AI agents with requirements:
Always specify constraints explicitly. AI agents fill gaps in requirements with defaults from their training data — defaults that may not match your context. An agent will reach for PostgreSQL, synchronous HTTP calls, and a monolithic structure unless you specify otherwise. If you need eventual consistency, say so. If you need to stay within a $50/month infrastructure budget, say so. If you need HIPAA compliance, say so. The agent cannot infer constraints it has not been told about.
Ask the agent to ask you questions first. Before asking an agent to implement anything non-trivial, prompt it: "Before you write any code, what clarifying questions do you have about the requirements?" A well-prompted agent will surface ambiguities — authentication mechanism, expected request volume, whether multi-tenancy is needed — that you may have overlooked. This is the same conversation a senior engineer would have before starting a feature.
Scope each session to one requirement. Agent output quality degrades as prompt complexity increases, because the agent has more implicit choices to make and less focus on each individual requirement. A single focused requirement — "implement user authentication using NextAuth with GitHub OAuth, storing sessions in the Postgres schema in prisma/schema.prisma, with no other changes to existing code" — produces more reliable results than a multi-requirement prompt. Break your requirements checklist into individual implementation units and deliver them sequentially.
Use CLAUDE.md to persist requirements across sessions. AI agents have no memory between sessions. A requirement you communicated in Monday's session is unknown on Tuesday unless it is in the project's instruction file. The CLAUDE.md pattern encodes project-level requirements — architecture decisions, forbidden patterns, required test coverage, compliance constraints — so that every session starts from the same foundation.
Review architecture, not just code. When the agent produces a plan or implementation, evaluate it against your requirements checklist, not just whether the code runs. Does the data model support the stated consistency requirement? Does the API design handle the stated throughput? Does the caching strategy introduce consistency bugs you said you cannot tolerate? The questions come from the requirements you defined before the agent started.
Summary#
| Principle | What It Means in Practice |
|---|---|
| Ask before you design | Requirements are not a warm-up — they are the inputs to every architectural decision. Without them, trade-offs are arbitrary |
| Functional requirements = scope | Identify the 3–5 core features and explicitly defer everything else. Scope definition is as important as scope itself |
| Non-functional requirements must be quantified | 'Fast' and 'reliable' are not requirements. 'p99 < 200ms' and '99.9% uptime' are requirements |
| Scale defines your architecture tier | 100 users and 100M users require different architectures. Build for your actual scale, not for the ceiling |
| YAGNI: don't build what you don't need yet | Every component added before it is required adds operational overhead — and is often ill-suited for the actual future requirement when it arrives |
| AI agents amplify both speed and mistakes | Wrong requirements at AI speed produce wrong code at scale. Requirements clarification is a multiplier on agent effectiveness |
| Plan first, code second | Write requirements.md, get the agent to produce a plan, review the plan before any code is written |
The most valuable skill in system design is not knowing every architecture pattern. It is knowing which questions to ask before you start, and having the discipline to answer them fully before drawing a single box.
Sources:
- How to Gather and Prioritize Requirements in System Design Interviews — DesignGurus
- A 3-Step Framework to Nail Your System Design Interview — interviewing.io
- Non-functional Requirements — System Design School
- Functional and Non-Functional Requirements — AltexSoft
- YAGNI Principle — Wikipedia
- Scale From Zero To Millions Of Users — ByteByteGo
- TSB Bank IT Migration Disaster — Henrico Dolfing
- How to Write a Good Spec for AI Agents — Addy Osmani / O'Reilly
- Best Practices for Claude Code — Claude Code Docs
- 12 Software Architecture Pitfalls and How to Avoid Them — InfoQ