Patterns in Claude Code
The previous sections covered what AI agents are and how to build the harness that turns a language model into one — the execution loop, tool dispatch, context management, guardrails, and observability. Claude Code is a production implementation of all of those ideas: Anthropic's official agentic CLI that reads your codebase, edits files, runs shell commands, and iterates autonomously until a task is complete.
What makes Claude Code worth studying is not just that it works, but how it is built. It is a concrete case study of the harness engineering principles from the previous chapter — applied at scale to a real coding agent. Understanding its patterns gives you a clear mental model for using Claude Code effectively, and a practical blueprint for building similar systems yourself.
New to Claude Code? If you want a hands-on introduction to installing and using Claude Code, start with the Getting Started with Claude Code tutorial first, then come back here for the deep dive into its architecture and design patterns.
What Is Claude Code?#
Claude Code is a command-line tool (claude) that runs in your terminal, pointed at a project directory. Unlike a chatbot that produces a single response, Claude Code is an autonomous coding agent: it reads files, understands codebase structure, and takes sequential actions — editing files, running tests, searching for patterns — until the task is done or it needs to ask you something.
cd your-project
claude "Add input validation to all API endpoints and write tests for each one"
After that command, Claude Code will read relevant files, make targeted edits across multiple files, run the test suite to verify its work, and fix any failures — all without you directing each step.
Claude Code's Operating Scope
Claude Code is not a chat interface with file access bolted on. It is an agent loop that drives a purpose-built tool suite. The model decides what to read, what to edit, and what to run — your code never leaves your machine.
A note on availability: Claude Code is available as a terminal command, a VS Code and JetBrains extension, a desktop application, and can run inside GitHub Actions for automated coding tasks in CI/CD pipelines.
The rest of this chapter walks through six patterns that make a Claude-Code-like coding agent work: the agent loop, tool dispatch, lazy loading, context compression, structured planning, sub-agent delegation, and cross-session memory. Claude Code's actual implementation is more sophisticated than what we describe here, but the core ideas are the same — and understanding them will help you both use Claude Code more effectively and build similar agents yourself.
The Master Agent Loop#
Every capability of a coding agent — reading files, writing edits, running tests, spawning sub-agents — is powered by a single, repeating pattern called the master agent loop.
The loop works like this: every response from the language model either contains a tool_use block (the model wants to call a tool) or it does not (the model considers its task done). If the model called a tool, execute the tool, append the result to the conversation history, and call the model again. Repeat until the model produces a response with no tool calls. The entire system reduces to roughly 30 lines of logic.
The Master Agent Loop
The complete core of a coding agent. Every file read, every edit, every bash command, every sub-agent spawn passes through this same loop. Adding a new capability means adding one tool handler — the loop itself never changes.
Why this design wins: The simplicity of the loop is its strength. Some agent systems use complex state machines, branching agent graphs, and parallel reasoning threads. A single-threaded loop is easier to debug, easier to extend, and easier to reason about. The core insight: the model is the agent — the system's job is to give it good tools and stay out of the way.
The Tool Suite#
Rather than giving the agent unrestricted access to the shell, the system provides a set of purpose-built, sandboxed tools — each designed for a specific category of action and constrained to prevent accidental damage. This is an important design choice: a narrowly defined tool is easier for the model to use correctly, produces predictable results, and can have safety rules enforced at the tool boundary rather than relying solely on the model's judgment.
| Category | Tool | What It Does | Why a Dedicated Tool? |
|---|---|---|---|
| Reading | View (Read) | Read file contents, e.g., up to 2000 lines by default | Enforces read-only semantics — cannot accidentally write when reading |
| Reading | LS | List directory contents | Scoped to the project; cannot accidentally list sensitive system directories |
| Searching | Glob | Wildcard file search (**/*.ts, src/**/*.py) | Faster and more structured than bash find for pattern matching |
| Searching | Grep | Full regex content search across files | The model writes its own regex patterns — no external index or vector database needed |
| Editing | Edit | Surgical replacement: old string → new string | Minimal blast radius; only the specified text changes, nothing else |
| Editing | Write | Whole-file create or overwrite | Used only when creating new files — Edit is preferred for changes to existing files |
| Execution | Bash | Run shell commands in a persistent session | Risk-classified before execution; write operations and deletions require confirmation |
| Planning | TodoWrite | Create and update a structured task list | Prevents the model from losing track of its plan across many loop iterations |
| Delegation | Task | Spawn a focused sub-agent with a clean context | Offloads context-heavy exploration; only a text summary returns to the parent |
| Parallel | BatchTool | Execute multiple tools simultaneously in one round-trip | Reduces sequential overhead when multiple independent reads are needed at once |
The key insight — regex over vector databases: A common assumption is that a coding agent needs an embedding index to efficiently search a large codebase. It does not. The model writes its own regex patterns and uses Grep to search code directly. This eliminates the complexity of maintaining an index, works on any codebase without setup, and handles code that has never been indexed. The tradeoff is that pattern-based search requires the model to know what to look for — which works well for concrete code structures (function names, imports, error codes) but less well for abstract semantic queries like "find all code related to authentication" when there is no clear pattern to match.
Tool safety is enforced by path sandboxing: every file path is validated against the working directory. A tool call that attempts to read or write outside the project root is rejected. Bash commands classified as risky (file deletion, network operations, system modifications) trigger a permission prompt before execution.
Pattern 1: The Tool Dispatch Map#
The agent loop needs a way to route each tool call to the right handler function. The tool dispatch map solves this with a simple dictionary that connects tool names (strings) to handler functions. When the model calls a tool by name, the loop looks up that name in the map and calls the corresponding function. Adding a new capability requires exactly two things: a handler function and an entry in the map.
TOOL_HANDLERS = {
"bash": lambda **kw: run_bash(kw["command"]),
"view": lambda **kw: run_read(kw["path"], kw.get("limit")),
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"], kw["new_text"]),
"glob": lambda **kw: run_glob(kw["pattern"]),
"grep": lambda **kw: run_grep(kw["pattern"], kw.get("path")),
"todo_write": lambda **kw: run_todo(kw["todos"]),
}
# The loop never changes — only the dispatch map does
def execute_tool(tool_name: str, tool_inputs: dict) -> str:
handler = TOOL_HANDLERS.get(tool_name)
if not handler:
return f"Error: unknown tool '{tool_name}'"
return handler(**tool_inputs)
Each tool also has a schema — a JSON description that tells the model what tools exist and how to call them. Before calling a tool, the model reads these schemas and generates a valid JSON argument object that matches the declared structure:
TOOL_SCHEMAS = [
{
"name": "edit_file",
"description": "Make a targeted edit to a file by replacing old_text with new_text. "
"Use this for surgical changes — do not rewrite the whole file unless necessary.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Absolute path to the file"},
"old_text": {"type": "string", "description": "The exact text to replace"},
"new_text": {"type": "string", "description": "The replacement text"},
},
"required": ["path", "old_text", "new_text"],
},
},
# ... other tools
]
The description field is critical: it is the primary signal the model uses to decide when to call a tool and what arguments to provide. A vague description causes the model to misuse the tool or choose the wrong one. Treat tool descriptions with the same care as a public API contract — they are the interface between your code and the model's reasoning.
| Tool Design Principle | Good Example | Bad Example |
|---|---|---|
| Specific names | search_product_catalog(query, max_results) | search(input) — the model cannot tell what this searches |
| Scoped return values | Return the top 5 results with only the fields the model needs | Return the entire raw API response — bloats the context with irrelevant fields |
| Atomic operations | create_calendar_event(title, time, attendees) — does one thing | manage_calendar(action, ...) — the action parameter makes it ambiguous |
| Idempotent where possible | Read operations: always safe to retry | Write operations without deduplication: retries create duplicate records |
Pattern 2: Tool Lazy Loading (ToolSearch)#
A coding agent might have 40+ built-in tools and potentially hundreds more from external plugins (MCP servers). Sending every tool's full schema — name, description, and parameter definitions — to the model on every API call wastes thousands of context tokens on tools the model will never use in a given conversation. Most tasks only need a handful of tools. The rest are dead weight in the prompt.
Tool lazy loading solves this by splitting tools into two tiers: a small set of always-available tools (Read, Edit, Bash, Grep, etc.) and a larger set of deferred tools whose schemas are only loaded when explicitly requested. The model sees the names of deferred tools in a brief list, but not their full schemas. When it needs one, it calls ToolSearch to load that tool's schema into context — after which the tool is available for the rest of the session.
Lazy Tool Loading via ToolSearch
Instead of sending 40+ full tool schemas on every API call, the harness sends only the core tools upfront. Deferred tools are listed by name only. When the model needs a deferred tool, it calls ToolSearch, which returns a tool_reference block that expands the full schema into context. Once discovered, the tool remains available for the rest of the session.
The lifecycle works like this:
- Turn 1: The model sees core tools (Read, Edit, Bash, Grep, Glob, ToolSearch) with full schemas, plus a system reminder listing deferred tool names only
- Turn N: The model decides it needs a deferred tool (e.g., WebFetch) and calls
ToolSearch({ query: "select:WebFetch" }) - ToolSearch returns a
tool_referenceblock — a special response type that tells the API to expand that tool's full schema into context - Turn N+1 onward: The tool is now fully available. The harness tracks discovered tools and includes them in all subsequent API calls automatically
Surviving compaction: When context compression (Pattern 3) replaces the conversation history with a summary, the message containing the original tool_reference block may be discarded. To prevent losing access to discovered tools, the compaction process preserves a list of all previously discovered tool names. These tools continue to be included in subsequent API calls even after their discovery messages are gone.
Pattern 3: Context Compression Pipeline#
The master agent loop appends every tool result to the conversation history. The tool dispatch map (Pattern 1) lets the model call dozens of tools. Together, they create a problem: context grows with every iteration. A single 1,000-line file is approximately 4,000 tokens. A coding task that reads 10 files, runs 5 commands, and makes 20 edits can easily accumulate 100,000+ tokens — exceeding most context windows and making every model call expensive.
Rather than a single compaction step, production coding agents use a multi-stage pipeline that runs before every API call. Each stage is progressively more aggressive — lightweight stages fire first, and the heavyweight full-compaction step only triggers if the cheaper stages were not enough. This cascading design minimizes latency in the common case while still handling worst-case context overflow.
The Context Compression Pipeline
Before every model call, the conversation passes through a cascade of reduction stages. Each stage is cheaper and less lossy than the next. Only when cheaper stages cannot bring context under the limit does the next stage fire. This ensures most requests pay near-zero compaction cost.
How Each Stage Works#
| Stage | Trigger | What It Does | Cost |
|---|---|---|---|
| Tool result budget | Every request | If tool results in a single message exceed 200K characters total, persists the largest results to disk and replaces with short previews (file path + first 2KB) | Zero — no API call, just local file I/O |
| History snip | Token count exceeds snip threshold | Drops the oldest messages entirely, without generating any summary. Records a snip boundary so the model knows context was removed | Zero — no API call, just array slicing |
| Microcompact | Every request (two strategies) | Time-based: if 60+ minutes since last activity, clears all but the 5 most recent tool results (cache is cold anyway). Count-based: if tool result count exceeds threshold, queues cache-edit deletions on the server side without modifying local messages | Zero for time-based (local mutation). Near-zero for count-based (server-side cache edit, no LLM call) |
| Context collapse | Token count exceeds collapse threshold | Identifies individual conversation segments that can be summarized independently. Each segment gets its own mini-summary, preserving more granular context than full compaction | One LLM call per segment — cheaper than full compaction since segments are small |
| Auto-compact | Token count exceeds auto-compact threshold (~92% of effective window) | Calls Claude to generate a structured 9-section summary of the entire conversation, then replaces all prior messages with that summary plus post-compact attachments | One full LLM call with the entire conversation as input — the most expensive stage |
The Auto-Compact Summary Structure#
When full compaction fires, it does not produce a generic "here's what happened" paragraph. It generates a structured 9-section summary designed to preserve everything the model needs to continue working effectively:
- Primary Request and Intent — all of the user's explicit goals
- Key Technical Concepts — technologies, frameworks, patterns discussed
- Files and Code Sections — specific files examined/modified/created, with code snippets
- Errors and Fixes — errors encountered and how they were resolved
- Problem Solving — problems solved and ongoing troubleshooting
- All User Messages — every non-tool-result user message (critical for understanding changing intent)
- Pending Tasks — tasks the user explicitly asked to work on
- Current Work — what was being worked on immediately before compaction
- Optional Next Step — the next step in line with the user's most recent request
This structure ensures that even after compaction, the model knows what the user wants, what has been tried, what files are involved, and what to do next. The summary is generated with a chain-of-thought scratchpad (stripped before storage) to improve quality.
Post-Compact Recovery#
After compaction replaces the conversation history, the harness regenerates several attachments that the model needs to continue working:
- File attachments — re-reads the 5 most recently accessed files (so the model does not need to re-read them)
- Plan attachment — current todo list, if one exists
- Deferred tools list — re-announces all deferred tools (since the original announcements were in discarded messages)
- Background agent status — status of any running sub-agents
- MCP server instructions — re-announces plugin configurations
This means the model resumes after compaction with its working set intact — it does not start from zero.
CLAUDE.md: The Persistence Layer You Control#
The compression pipeline handles within-session context pressure. But what about knowledge that should persist across sessions? That is the role of CLAUDE.md — a project configuration file loaded at the start of every new session, before the first model call.
CLAUDE.md is not part of the compaction pipeline — it is immune to compaction because it lives in the system prompt, not the conversation history. This makes it the right place for:
- Build commands and project setup instructions
- Coding conventions and architectural decisions
- Constraints the model must always respect (e.g., "never modify the migration files directly")
- Key file locations and project structure notes
Think of CLAUDE.md as the agent's persistent memory that you control directly. The compaction pipeline handles everything else automatically.
Pattern 4: Structured Planning with TodoWrite#
Multi-step coding tasks have a fundamental problem: a model working across many loop iterations can lose track of its overall plan. It may complete step 1, start step 2, get distracted by an unexpected finding, and never return to step 3 — with no indication that a step was skipped.
TodoWrite addresses this by maintaining a prioritized task list with explicit status tracking, visible to the model at every iteration. Think of it as the agent writing down its plan so it can refer back to it, the same way you would with a notepad when working through a complex task.
TodoWrite: Planning Under Pressure
The model creates a structured task list at the start of a complex task and updates it after every significant action. This creates a persistent, self-correcting plan that survives across many loop iterations. Only one task can be in_progress at a time. A reminder injection mechanism steers the model back if it gets distracted.
How TodoWrite Works in Practice#
Each entry in the todo list has three fields: a description (what to do), a status (pending, in_progress, or completed), and a priority (high, medium, low). The model writes the full list at the start of a complex task, then updates individual entries as it progresses — marking them in_progress when starting and completed when done.
The key constraint is that only one task can be in_progress at a time. This forces sequential focus: the model finishes what it started before moving on. If you need parallel work, that is what sub-agents are for — each sub-agent can have its own independent todo list.
The reminder injection mechanism: If the model goes three or more loop iterations without updating the todo list, the system automatically injects a <reminder>Update your todo list</reminder> tag into the next tool result. This is a concrete example of how production agent systems use soft constraints — you cannot force the model to do something, but you can reliably steer it toward the right behavior by injecting structured signals into its context at the right moment.
Survival across compaction: The todo list is preserved as a post-compact attachment (see Pattern 3). When context compression replaces the conversation history with a summary, the current plan state is re-attached so the model resumes with its task list intact — it does not lose track of what it was doing.
Pattern 5: Sub-Agents and Background Tasks#
TodoWrite keeps the model focused within a single loop — but some tasks are too large for one agent to handle alone. Consider: "Find all places in this codebase that handle authentication and summarize the approach." Answering it might require reading dozens of files — potentially 50,000 tokens of content. If the main agent reads all of these itself, every file accumulates in the main context. The result: every subsequent model call becomes more expensive, slower, and more likely to lose track of earlier context.
Sub-agents solve this by isolating context-heavy work. A sub-agent is another instance of the agent loop, started with a fresh, empty context and given a focused task. It does all the reading and reasoning it needs to, then returns only a compact text summary to the parent. The parent never accumulates the raw file contents — only the final answer.
But context isolation is only half the story. Production coding agents also need background execution — the ability to run long-running tasks (test suites, builds, multi-file refactors) without blocking the user's interactive conversation. The task system unifies both patterns under a single lifecycle: pending → running → completed/failed/killed.
Sub-Agents and the Task System
The Agent tool (aliased as Task) spawns sub-agents in multiple modes: synchronous (blocks the parent until done), background (runs concurrently, parent checks in later), worktree-isolated (operates on a separate git branch), and remote (runs on a separate machine). All modes follow the same lifecycle and return text summaries to the parent.
The Task Lifecycle#
Every sub-agent — whether synchronous, background, or worktree-isolated — follows the same lifecycle:
| Status | What It Means | What the Parent Can Do |
|---|---|---|
| pending | Task is created but not yet running (e.g., queued behind other tasks) | Wait — the task will start automatically |
| running | Sub-agent loop is actively executing (reading files, making edits, running commands) | Poll with TaskGet to check progress, or continue other work and check later |
| completed | Sub-agent finished its task and produced a result | Read the output with TaskOutput — this is the text summary that enters the parent's context |
| failed | Sub-agent encountered an unrecoverable error (max steps exceeded, tool crashed, context overflow) | Read the error, decide whether to retry with a different approach or handle the failure manually |
| killed | Parent or user explicitly stopped the task (e.g., it was taking too long or going in the wrong direction) | Inspect partial output if useful, then proceed with an alternative approach |
Sub-Agent Permission Scoping#
A critical safety feature is that sub-agents can be given restricted tool sets. The parent chooses which tools the sub-agent has access to:
- Read-only sub-agents (Glob, Grep, View only) — perfect for exploration tasks where the sub-agent should investigate and report but never modify anything
- Full-access sub-agents (all tools including Edit, Write, Bash) — for implementation tasks where the sub-agent needs to make changes
- Custom restrictions — any combination, such as allowing Bash but only for read-only commands (ls, find, grep)
This is defense-in-depth: even if a sub-agent's reasoning goes wrong, it physically cannot take actions outside its permitted scope. An exploration sub-agent that hallucinates a desire to "fix" something it found cannot actually execute an edit.
Coordinator Mode: Multi-Agent Orchestration#
For complex tasks that naturally decompose into independent subtasks, the system supports a coordinator mode where the main agent acts purely as an orchestrator — it does not implement anything itself but instead delegates all work to specialized worker sub-agents.
The coordinator's workflow follows four phases:
- Research — spawn read-only sub-agents to explore the codebase and understand the current state
- Synthesis — analyze the research results and plan the implementation approach
- Implementation — spawn worker sub-agents (potentially in worktrees) to make the actual changes
- Verification — spawn sub-agents to run tests, verify the changes work, and report any issues
The coordinator only has access to three tools: Agent (to spawn workers), SendMessage (to communicate with running workers), and TaskStop (to kill workers that go off-track). It cannot directly read files, edit code, or run commands — forcing it to delegate everything.
This pattern is most valuable for large-scale tasks that span many files or require multiple independent changes — the kind of work where a single agent would exhaust its context window before finishing.
Pattern 6: Cross-Session Memory#
Patterns 1–5 all operate within a single conversation session — the loop, tools, compression, planning, and delegation all reset when the session ends. But sessions end. When you start a new conversation tomorrow, the agent knows nothing about what happened today — your preferences, the corrections you gave, the project decisions you made. Without a persistence layer that spans sessions, the agent repeats the same mistakes, asks the same questions, and ignores conventions you have already established.
Cross-session memory solves this by maintaining a collection of markdown files on disk — each capturing one fact about the user, a piece of feedback, project context, or a reference to an external system. These files are loaded into future sessions so the agent starts with relevant knowledge already available, without you needing to repeat yourself.
Cross-Session Memory Architecture
Memory enters the conversation through two channels: (1) an always-on system prompt injection that loads the MEMORY.md index, and (2) per-query semantic recall that selects up to 5 relevant memory files using a lightweight side-query to a smaller model. Memories are written through three paths: direct writes by the main agent, a background extraction agent that runs after each query, and periodic session memory notes.
The Four Memory Types#
Memory is not a dumping ground — it uses a strict four-type taxonomy to keep stored knowledge focused and actionable:
| Type | What It Stores | Example |
|---|---|---|
| user | Role, goals, expertise level, preferences — who the user is and how to tailor responses | "User is a data scientist focused on observability; deep Go expertise but new to React — frame frontend explanations using backend analogues" |
| feedback | Corrections AND confirmations of approach — both what to stop doing and what to keep doing | "Don't mock the database in tests — prior incident where mocked tests passed but prod migration failed. Confirmed: bundled PRs preferred for refactors in this area" |
| project | Ongoing work, decisions, deadlines, and incidents not derivable from code or git history | "Auth middleware rewrite is driven by legal/compliance requirements around session token storage, not tech debt — scope decisions should favor compliance over ergonomics" |
| reference | Pointers to where information lives in external systems | "Pipeline bugs are tracked in Linear project INGEST. Oncall latency dashboard: grafana.internal/d/api-latency" |
The critical design constraint: memory stores only what cannot be derived from the current project state. Code patterns, architecture, file paths, git history, and anything in CLAUDE.md are explicitly excluded — those can be read directly from the source. Memory captures the why behind decisions, the who behind preferences, and the where of external resources.
How Recall Works#
When the user submits a query, the system performs semantic recall in four steps:
- Scan — read all
.mdfiles in the memory directory (up to 200), parsing their YAML frontmatter (name, description, type) - Filter — exclude memories already surfaced in this session (prevents re-surfacing the same memory repeatedly)
- Select — send the list of memory descriptions plus the user's query to a lightweight model (Sonnet), which picks up to 5 relevant memories based on name/description match
- Inject — read the selected files and inject their content as attachments to the current query, with freshness warnings for memories older than 1 day
The description field in each memory's frontmatter is what the selector reads — it never sees the full content during selection. This means a well-written description is the difference between a memory being found and being invisible. Write descriptions as if you are writing a search result snippet: specific, concrete, and containing the keywords someone would use to look for this information.
What Makes This Different from RAG#
Traditional retrieval-augmented generation (RAG) uses embedding vectors and similarity search over a large corpus. The memory system takes a different approach: structured files with a curated index, recalled by a language model rather than vector similarity. This has several advantages for an agent context:
- No embedding infrastructure needed — just markdown files on disk
- The selector understands context and intent, not just keyword similarity
- Each memory has explicit metadata (type, description) that aids selection
- Users can directly read, edit, and delete memory files — full transparency and control
- The index (
MEMORY.md) provides a human-readable overview of all stored knowledge
The tradeoff is scale: vector search handles millions of documents efficiently, while this system caps at ~200 memories. For a personal coding agent, 200 well-curated memories is more than enough — the bottleneck is quality, not quantity.
How to Build a Minimum Version#
The patterns above can be implemented progressively. The open-source learn-claude-code reference project (docs) implements a coding agent's core from scratch in Python across 12 sessions, each adding one pattern on top of the previous. This progression also serves as a learning roadmap:
| Session | What You Build | Key Concept Added |
|---|---|---|
| s01 | Basic agent loop: messages → LLM → tool_use check → execute → loop | One loop and Bash is enough for a surprising range of tasks |
| s02 | Tool dispatch map: View, Write, Edit, Grep handlers | Adding a tool = adding one entry. The loop itself never changes |
| s03 | TodoWrite + reminder injection after 3 idle iterations | Persistent planning prevents drift in multi-step tasks |
| s04 | Task tool: sub-agent spawning with clean context isolation | Each subtask gets a fresh context; only the summary returns to the parent |
| s05 | Skill loading from the filesystem on demand | Load domain knowledge only when needed; avoid bloating the system prompt upfront |
| s06 | Three-layer context compression | Context will fill up — make room before you run out, not after |
| s07–s12 | Disk persistence, background tasks, agent teams, git worktrees | Production hardening and parallel agent coordination at scale |
The minimum viable agent loop — the core built in session 1 — looks like this:
import anthropic
client = anthropic.Anthropic()
def run_agent(task: str, tools: list, tool_handlers: dict) -> str:
messages = [{"role": "user", "content": task}]
while True:
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
tools=tools, # list of tool schemas
messages=messages,
)
# Append the assistant's response to the history
messages.append({"role": "assistant", "content": response.content})
# If the model produced no tool calls, it is done
if response.stop_reason != "tool_use":
return next(b.text for b in response.content if b.type == "text")
# Execute each tool call and collect the results
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = tool_handlers[block.name](**block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
# Append tool results and loop — the model will be called again
messages.append({"role": "user", "content": tool_results})
This is approximately 30 lines. Every capability described in this tutorial — file reading, editing, sub-agents, planning — is an extension of this loop. The loop itself never changes.
A practical starting point: Anthropic's claude_agent_sdk package wraps this loop with built-in support for common tool sets and automatic session management, so you can focus on defining your tools rather than wiring up the loop from scratch:
from claude_agent_sdk import Agent, tools
agent = Agent(
tools=[tools.bash, tools.file_read, tools.file_write, tools.grep],
system_prompt=(
"You are a helpful coding assistant. "
"Always prefer surgical edits over rewriting entire files."
),
)
result = agent.run("Add type hints to all functions in src/utils.py and run mypy to verify")
Start here for your first agent. Add custom tools — following the dispatch map pattern — as your specific use case requires.
Summary#
| Pattern | What It Solves | Key Constraint to Remember |
|---|---|---|
| Master agent loop | Drives all autonomous behavior: the model decides what to do next by calling tools or producing a final answer | Always set a max iteration limit — a stuck model will consume your entire API budget without one |
| Specialized tool suite | Gives the model safe, scoped actions. Path sandboxing prevents writes outside the project. Risk classification gates destructive bash commands | Tool descriptions are the model's interface — vague descriptions lead to wrong tool selection and incorrect arguments |
| Tool dispatch map | Makes the system extensible: adding a new capability = adding one handler + one schema entry. The loop never changes | The schema description must accurately reflect what the tool does and when to use it — treat it like public API documentation |
| Tool lazy loading (ToolSearch) | Saves thousands of context tokens per API call by deferring infrequently-used tool schemas. The model discovers tools on demand via ToolSearch | Core tools (Read, Edit, Bash) must always be available. Only defer tools that are rarely needed — deferring a frequently-used tool adds friction |
| Context compression pipeline | A cascading pipeline (tool result budget → history snip → microcompact → context collapse → auto-compact) keeps context lean. Cheap stages fire first; full compaction is the last resort. CLAUDE.md persists knowledge across sessions outside the pipeline entirely | Full auto-compact is expensive and lossy — let cheap stages handle routine pressure. CLAUDE.md is the only layer you directly control; keep it concise since it loads on every session start |
| TodoWrite | Prevents plan drift across many loop iterations. Reminder injection steers the model back to its plan if it gets distracted | Only one task in_progress at a time — parallelism requires spawning sub-agents, not concurrent in-loop planning |
| Sub-agents and background tasks | Isolates context-heavy work via sub-agents (sync, background, worktree-isolated). Background mode runs tasks concurrently without blocking the user. Coordinator mode enables full multi-agent orchestration for large-scale tasks | Sub-agents cannot spawn their own sub-agents — two levels max. Background tasks add polling complexity; use synchronous mode for anything under 30 seconds |
| Cross-session memory | Persists user preferences, feedback, project context, and external references across sessions as markdown files on disk. Semantic recall surfaces up to 5 relevant memories per query | Only store what cannot be derived from the current project state. Memory descriptions must be specific and concrete — a poorly described memory is effectively invisible to recall |
The lesson from studying coding agents is that powerful agents do not require complex orchestration. They require a reliable loop, well-designed tools (Patterns 1–2), active context management (Pattern 3), structured planning (Pattern 4), the ability to delegate (Pattern 5), and persistence across sessions (Pattern 6). These concerns compound: a reliable loop means predictable debugging; well-specified tools mean fewer model errors; lean context means consistent reasoning across long tasks; memory means the agent improves over time. Start with the loop. Get the tools right. Add compression and planning. Then extend outward to delegation and memory as your use cases demand.
Sources:
- Claude Code Behind-the-Scenes: Master Agent Loop — PromptLayer Blog
- Tracing Claude Code's LLM Traffic: Agentic Loop, Sub-agents, Tool Use — George Sung, Medium
- Claude Code Agent Architecture: Single-Threaded Master Loop — ZenML LLMOps Database
- learn-claude-code — shareAI-lab, GitHub
- How the Agent Loop Works — Claude API Docs
- Claude Code Overview — Official Docs