Patterns in Claude Code

The previous sections covered what AI agents are and how to build the harness that turns a language model into one — the execution loop, tool dispatch, context management, guardrails, and observability. Claude Code is a production implementation of all of those ideas: Anthropic's official agentic CLI that reads your codebase, edits files, runs shell commands, and iterates autonomously until a task is complete.

What makes Claude Code worth studying is not just that it works, but how it is built. It is a concrete case study of the harness engineering principles from the previous chapter — applied at scale to a real coding agent. Understanding its patterns gives you a clear mental model for using Claude Code effectively, and a practical blueprint for building similar systems yourself.

New to Claude Code? If you want a hands-on introduction to installing and using Claude Code, start with the Getting Started with Claude Code tutorial first, then come back here for the deep dive into its architecture and design patterns.

What Is Claude Code?#

Claude Code is a command-line tool (claude) that runs in your terminal, pointed at a project directory. Unlike a chatbot that produces a single response, Claude Code is an autonomous coding agent: it reads files, understands codebase structure, and takes sequential actions — editing files, running tests, searching for patterns — until the task is done or it needs to ask you something.

cd your-project
claude "Add input validation to all API endpoints and write tests for each one"

After that command, Claude Code will read relevant files, make targeted edits across multiple files, run the test suite to verify its work, and fix any failures — all without you directing each step.

Claude Code's Operating Scope

Claude Code is not a chat interface with file access bolted on. It is an agent loop that drives a purpose-built tool suite. The model decides what to read, what to edit, and what to run — your code never leaves your machine.

Rendering diagram...

A note on availability: Claude Code is available as a terminal command, a VS Code and JetBrains extension, a desktop application, and can run inside GitHub Actions for automated coding tasks in CI/CD pipelines.

The rest of this chapter walks through six patterns that make a Claude-Code-like coding agent work: the agent loop, tool dispatch, lazy loading, context compression, structured planning, sub-agent delegation, and cross-session memory. Claude Code's actual implementation is more sophisticated than what we describe here, but the core ideas are the same — and understanding them will help you both use Claude Code more effectively and build similar agents yourself.

The Master Agent Loop#

Every capability of a coding agent — reading files, writing edits, running tests, spawning sub-agents — is powered by a single, repeating pattern called the master agent loop.

The loop works like this: every response from the language model either contains a tool_use block (the model wants to call a tool) or it does not (the model considers its task done). If the model called a tool, execute the tool, append the result to the conversation history, and call the model again. Repeat until the model produces a response with no tool calls. The entire system reduces to roughly 30 lines of logic.

The Master Agent Loop

The complete core of a coding agent. Every file read, every edit, every bash command, every sub-agent spawn passes through this same loop. Adding a new capability means adding one tool handler — the loop itself never changes.

Rendering diagram...

Why this design wins: The simplicity of the loop is its strength. Some agent systems use complex state machines, branching agent graphs, and parallel reasoning threads. A single-threaded loop is easier to debug, easier to extend, and easier to reason about. The core insight: the model is the agent — the system's job is to give it good tools and stay out of the way.

The Tool Suite#

Rather than giving the agent unrestricted access to the shell, the system provides a set of purpose-built, sandboxed tools — each designed for a specific category of action and constrained to prevent accidental damage. This is an important design choice: a narrowly defined tool is easier for the model to use correctly, produces predictable results, and can have safety rules enforced at the tool boundary rather than relying solely on the model's judgment.

Category	Tool	What It Does	Why a Dedicated Tool?
Reading	`View` (Read)	Read file contents, e.g., up to 2000 lines by default	Enforces read-only semantics — cannot accidentally write when reading
Reading	`LS`	List directory contents	Scoped to the project; cannot accidentally list sensitive system directories
Searching	`Glob`	Wildcard file search (`*/.ts`, `src/*/.py`)	Faster and more structured than bash `find` for pattern matching
Searching	`Grep`	Full regex content search across files	The model writes its own regex patterns — no external index or vector database needed
Editing	`Edit`	Surgical replacement: old string → new string	Minimal blast radius; only the specified text changes, nothing else
Editing	`Write`	Whole-file create or overwrite	Used only when creating new files — Edit is preferred for changes to existing files
Execution	`Bash`	Run shell commands in a persistent session	Risk-classified before execution; write operations and deletions require confirmation
Planning	`TodoWrite`	Create and update a structured task list	Prevents the model from losing track of its plan across many loop iterations
Delegation	`Task`	Spawn a focused sub-agent with a clean context	Offloads context-heavy exploration; only a text summary returns to the parent
Parallel	`BatchTool`	Execute multiple tools simultaneously in one round-trip	Reduces sequential overhead when multiple independent reads are needed at once

The key insight — regex over vector databases: A common assumption is that a coding agent needs an embedding index to efficiently search a large codebase. It does not. The model writes its own regex patterns and uses Grep to search code directly. This eliminates the complexity of maintaining an index, works on any codebase without setup, and handles code that has never been indexed. The tradeoff is that pattern-based search requires the model to know what to look for — which works well for concrete code structures (function names, imports, error codes) but less well for abstract semantic queries like "find all code related to authentication" when there is no clear pattern to match.

Tool safety is enforced by path sandboxing: every file path is validated against the working directory. A tool call that attempts to read or write outside the project root is rejected. Bash commands classified as risky (file deletion, network operations, system modifications) trigger a permission prompt before execution.

Pattern 1: The Tool Dispatch Map#

The agent loop needs a way to route each tool call to the right handler function. The tool dispatch map solves this with a simple dictionary that connects tool names (strings) to handler functions. When the model calls a tool by name, the loop looks up that name in the map and calls the corresponding function. Adding a new capability requires exactly two things: a handler function and an entry in the map.

TOOL_HANDLERS = {
    "bash":       lambda **kw: run_bash(kw["command"]),
    "view":       lambda **kw: run_read(kw["path"], kw.get("limit")),
    "write_file": lambda **kw: run_write(kw["path"], kw["content"]),
    "edit_file":  lambda **kw: run_edit(kw["path"], kw["old_text"], kw["new_text"]),
    "glob":       lambda **kw: run_glob(kw["pattern"]),
    "grep":       lambda **kw: run_grep(kw["pattern"], kw.get("path")),
    "todo_write": lambda **kw: run_todo(kw["todos"]),
}

# The loop never changes — only the dispatch map does
def execute_tool(tool_name: str, tool_inputs: dict) -> str:
    handler = TOOL_HANDLERS.get(tool_name)
    if not handler:
        return f"Error: unknown tool '{tool_name}'"
    return handler(**tool_inputs)

Each tool also has a schema — a JSON description that tells the model what tools exist and how to call them. Before calling a tool, the model reads these schemas and generates a valid JSON argument object that matches the declared structure:

TOOL_SCHEMAS = [
    {
        "name": "edit_file",
        "description": "Make a targeted edit to a file by replacing old_text with new_text. "
                       "Use this for surgical changes — do not rewrite the whole file unless necessary.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path":     {"type": "string", "description": "Absolute path to the file"},
                "old_text": {"type": "string", "description": "The exact text to replace"},
                "new_text": {"type": "string", "description": "The replacement text"},
            },
            "required": ["path", "old_text", "new_text"],
        },
    },
    # ... other tools
]

The description field is critical: it is the primary signal the model uses to decide when to call a tool and what arguments to provide. A vague description causes the model to misuse the tool or choose the wrong one. Treat tool descriptions with the same care as a public API contract — they are the interface between your code and the model's reasoning.

Tool Design Principle	Good Example	Bad Example
Specific names	`search_product_catalog(query, max_results)`	`search(input)` — the model cannot tell what this searches
Scoped return values	Return the top 5 results with only the fields the model needs	Return the entire raw API response — bloats the context with irrelevant fields
Atomic operations	`create_calendar_event(title, time, attendees)` — does one thing	`manage_calendar(action, ...)` — the action parameter makes it ambiguous
Idempotent where possible	Read operations: always safe to retry	Write operations without deduplication: retries create duplicate records

Pattern 2: Tool Lazy Loading (ToolSearch)#

A coding agent might have 40+ built-in tools and potentially hundreds more from external plugins (MCP servers). Sending every tool's full schema — name, description, and parameter definitions — to the model on every API call wastes thousands of context tokens on tools the model will never use in a given conversation. Most tasks only need a handful of tools. The rest are dead weight in the prompt.

Tool lazy loading solves this by splitting tools into two tiers: a small set of always-available tools (Read, Edit, Bash, Grep, etc.) and a larger set of deferred tools whose schemas are only loaded when explicitly requested. The model sees the names of deferred tools in a brief list, but not their full schemas. When it needs one, it calls ToolSearch to load that tool's schema into context — after which the tool is available for the rest of the session.

Lazy Tool Loading via ToolSearch

Instead of sending 40+ full tool schemas on every API call, the harness sends only the core tools upfront. Deferred tools are listed by name only. When the model needs a deferred tool, it calls ToolSearch, which returns a tool_reference block that expands the full schema into context. Once discovered, the tool remains available for the rest of the session.

Rendering diagram...

The lifecycle works like this:

Turn 1: The model sees core tools (Read, Edit, Bash, Grep, Glob, ToolSearch) with full schemas, plus a system reminder listing deferred tool names only
Turn N: The model decides it needs a deferred tool (e.g., WebFetch) and calls ToolSearch({ query: "select:WebFetch" })
ToolSearch returns a tool_reference block — a special response type that tells the API to expand that tool's full schema into context
Turn N+1 onward: The tool is now fully available. The harness tracks discovered tools and includes them in all subsequent API calls automatically

Surviving compaction: When context compression (Pattern 3) replaces the conversation history with a summary, the message containing the original tool_reference block may be discarded. To prevent losing access to discovered tools, the compaction process preserves a list of all previously discovered tool names. These tools continue to be included in subsequent API calls even after their discovery messages are gone.

Pattern 3: Context Compression Pipeline#

The master agent loop appends every tool result to the conversation history. The tool dispatch map (Pattern 1) lets the model call dozens of tools. Together, they create a problem: context grows with every iteration. A single 1,000-line file is approximately 4,000 tokens. A coding task that reads 10 files, runs 5 commands, and makes 20 edits can easily accumulate 100,000+ tokens — exceeding most context windows and making every model call expensive.

Rather than a single compaction step, production coding agents use a multi-stage pipeline that runs before every API call. Each stage is progressively more aggressive — lightweight stages fire first, and the heavyweight full-compaction step only triggers if the cheaper stages were not enough. This cascading design minimizes latency in the common case while still handling worst-case context overflow.

The Context Compression Pipeline

Before every model call, the conversation passes through a cascade of reduction stages. Each stage is cheaper and less lossy than the next. Only when cheaper stages cannot bring context under the limit does the next stage fire. This ensures most requests pay near-zero compaction cost.

Rendering diagram...

How Each Stage Works#

Stage	Trigger	What It Does	Cost
Tool result budget	Every request	If tool results in a single message exceed 200K characters total, persists the largest results to disk and replaces with short previews (file path + first 2KB)	Zero — no API call, just local file I/O
History snip	Token count exceeds snip threshold	Drops the oldest messages entirely, without generating any summary. Records a snip boundary so the model knows context was removed	Zero — no API call, just array slicing
Microcompact	Every request (two strategies)	Time-based: if 60+ minutes since last activity, clears all but the 5 most recent tool results (cache is cold anyway). Count-based: if tool result count exceeds threshold, queues cache-edit deletions on the server side without modifying local messages	Zero for time-based (local mutation). Near-zero for count-based (server-side cache edit, no LLM call)
Context collapse	Token count exceeds collapse threshold	Identifies individual conversation segments that can be summarized independently. Each segment gets its own mini-summary, preserving more granular context than full compaction	One LLM call per segment — cheaper than full compaction since segments are small
Auto-compact	Token count exceeds auto-compact threshold (~92% of effective window)	Calls Claude to generate a structured 9-section summary of the entire conversation, then replaces all prior messages with that summary plus post-compact attachments	One full LLM call with the entire conversation as input — the most expensive stage

The Auto-Compact Summary Structure#

When full compaction fires, it does not produce a generic "here's what happened" paragraph. It generates a structured 9-section summary designed to preserve everything the model needs to continue working effectively:

Primary Request and Intent — all of the user's explicit goals
Key Technical Concepts — technologies, frameworks, patterns discussed
Files and Code Sections — specific files examined/modified/created, with code snippets
Errors and Fixes — errors encountered and how they were resolved
Problem Solving — problems solved and ongoing troubleshooting
All User Messages — every non-tool-result user message (critical for understanding changing intent)
Pending Tasks — tasks the user explicitly asked to work on
Current Work — what was being worked on immediately before compaction
Optional Next Step — the next step in line with the user's most recent request

This structure ensures that even after compaction, the model knows what the user wants, what has been tried, what files are involved, and what to do next. The summary is generated with a chain-of-thought scratchpad (stripped before storage) to improve quality.

Post-Compact Recovery#

After compaction replaces the conversation history, the harness regenerates several attachments that the model needs to continue working:

File attachments — re-reads the 5 most recently accessed files (so the model does not need to re-read them)
Plan attachment — current todo list, if one exists
Deferred tools list — re-announces all deferred tools (since the original announcements were in discarded messages)
Background agent status — status of any running sub-agents
MCP server instructions — re-announces plugin configurations

This means the model resumes after compaction with its working set intact — it does not start from zero.

CLAUDE.md: The Persistence Layer You Control#

The compression pipeline handles within-session context pressure. But what about knowledge that should persist across sessions? That is the role of CLAUDE.md — a project configuration file loaded at the start of every new session, before the first model call.

CLAUDE.md is not part of the compaction pipeline — it is immune to compaction because it lives in the system prompt, not the conversation history. This makes it the right place for:

Build commands and project setup instructions
Coding conventions and architectural decisions
Constraints the model must always respect (e.g., "never modify the migration files directly")
Key file locations and project structure notes

Think of CLAUDE.md as the agent's persistent memory that you control directly. The compaction pipeline handles everything else automatically.

Pattern 4: Structured Planning with TodoWrite#

Multi-step coding tasks have a fundamental problem: a model working across many loop iterations can lose track of its overall plan. It may complete step 1, start step 2, get distracted by an unexpected finding, and never return to step 3 — with no indication that a step was skipped.

TodoWrite addresses this by maintaining a prioritized task list with explicit status tracking, visible to the model at every iteration. Think of it as the agent writing down its plan so it can refer back to it, the same way you would with a notepad when working through a complex task.

TodoWrite: Planning Under Pressure

The model creates a structured task list at the start of a complex task and updates it after every significant action. This creates a persistent, self-correcting plan that survives across many loop iterations. Only one task can be in_progress at a time. A reminder injection mechanism steers the model back if it gets distracted.

Rendering diagram...

How TodoWrite Works in Practice#

Each entry in the todo list has three fields: a description (what to do), a status (pending, in_progress, or completed), and a priority (high, medium, low). The model writes the full list at the start of a complex task, then updates individual entries as it progresses — marking them in_progress when starting and completed when done.

The key constraint is that only one task can be in_progress at a time. This forces sequential focus: the model finishes what it started before moving on. If you need parallel work, that is what sub-agents are for — each sub-agent can have its own independent todo list.

The reminder injection mechanism: If the model goes three or more loop iterations without updating the todo list, the system automatically injects a <reminder>Update your todo list</reminder> tag into the next tool result. This is a concrete example of how production agent systems use soft constraints — you cannot force the model to do something, but you can reliably steer it toward the right behavior by injecting structured signals into its context at the right moment.

Survival across compaction: The todo list is preserved as a post-compact attachment (see Pattern 3). When context compression replaces the conversation history with a summary, the current plan state is re-attached so the model resumes with its task list intact — it does not lose track of what it was doing.

Pattern 5: Sub-Agents and Background Tasks#

TodoWrite keeps the model focused within a single loop — but some tasks are too large for one agent to handle alone. Consider: "Find all places in this codebase that handle authentication and summarize the approach." Answering it might require reading dozens of files — potentially 50,000 tokens of content. If the main agent reads all of these itself, every file accumulates in the main context. The result: every subsequent model call becomes more expensive, slower, and more likely to lose track of earlier context.

Sub-agents solve this by isolating context-heavy work. A sub-agent is another instance of the agent loop, started with a fresh, empty context and given a focused task. It does all the reading and reasoning it needs to, then returns only a compact text summary to the parent. The parent never accumulates the raw file contents — only the final answer.

But context isolation is only half the story. Production coding agents also need background execution — the ability to run long-running tasks (test suites, builds, multi-file refactors) without blocking the user's interactive conversation. The task system unifies both patterns under a single lifecycle: pending → running → completed/failed/killed.

Sub-Agents and the Task System

The Agent tool (aliased as Task) spawns sub-agents in multiple modes: synchronous (blocks the parent until done), background (runs concurrently, parent checks in later), worktree-isolated (operates on a separate git branch), and remote (runs on a separate machine). All modes follow the same lifecycle and return text summaries to the parent.

Rendering diagram...

The Task Lifecycle#

Every sub-agent — whether synchronous, background, or worktree-isolated — follows the same lifecycle:

Status	What It Means	What the Parent Can Do
pending	Task is created but not yet running (e.g., queued behind other tasks)	Wait — the task will start automatically
running	Sub-agent loop is actively executing (reading files, making edits, running commands)	Poll with TaskGet to check progress, or continue other work and check later
completed	Sub-agent finished its task and produced a result	Read the output with TaskOutput — this is the text summary that enters the parent's context
failed	Sub-agent encountered an unrecoverable error (max steps exceeded, tool crashed, context overflow)	Read the error, decide whether to retry with a different approach or handle the failure manually
killed	Parent or user explicitly stopped the task (e.g., it was taking too long or going in the wrong direction)	Inspect partial output if useful, then proceed with an alternative approach

Sub-Agent Permission Scoping#

A critical safety feature is that sub-agents can be given restricted tool sets. The parent chooses which tools the sub-agent has access to:

Read-only sub-agents (Glob, Grep, View only) — perfect for exploration tasks where the sub-agent should investigate and report but never modify anything
Full-access sub-agents (all tools including Edit, Write, Bash) — for implementation tasks where the sub-agent needs to make changes
Custom restrictions — any combination, such as allowing Bash but only for read-only commands (ls, find, grep)

This is defense-in-depth: even if a sub-agent's reasoning goes wrong, it physically cannot take actions outside its permitted scope. An exploration sub-agent that hallucinates a desire to "fix" something it found cannot actually execute an edit.

Coordinator Mode: Multi-Agent Orchestration#

For complex tasks that naturally decompose into independent subtasks, the system supports a coordinator mode where the main agent acts purely as an orchestrator — it does not implement anything itself but instead delegates all work to specialized worker sub-agents.

The coordinator's workflow follows four phases:

Research — spawn read-only sub-agents to explore the codebase and understand the current state
Synthesis — analyze the research results and plan the implementation approach
Implementation — spawn worker sub-agents (potentially in worktrees) to make the actual changes
Verification — spawn sub-agents to run tests, verify the changes work, and report any issues

The coordinator only has access to three tools: Agent (to spawn workers), SendMessage (to communicate with running workers), and TaskStop (to kill workers that go off-track). It cannot directly read files, edit code, or run commands — forcing it to delegate everything.

This pattern is most valuable for large-scale tasks that span many files or require multiple independent changes — the kind of work where a single agent would exhaust its context window before finishing.

Pattern 6: Cross-Session Memory#

Patterns 1–5 all operate within a single conversation session — the loop, tools, compression, planning, and delegation all reset when the session ends. But sessions end. When you start a new conversation tomorrow, the agent knows nothing about what happened today — your preferences, the corrections you gave, the project decisions you made. Without a persistence layer that spans sessions, the agent repeats the same mistakes, asks the same questions, and ignores conventions you have already established.

Cross-session memory solves this by maintaining a collection of markdown files on disk — each capturing one fact about the user, a piece of feedback, project context, or a reference to an external system. These files are loaded into future sessions so the agent starts with relevant knowledge already available, without you needing to repeat yourself.

Cross-Session Memory Architecture

Memory enters the conversation through two channels: (1) an always-on system prompt injection that loads the MEMORY.md index, and (2) per-query semantic recall that selects up to 5 relevant memory files using a lightweight side-query to a smaller model. Memories are written through three paths: direct writes by the main agent, a background extraction agent that runs after each query, and periodic session memory notes.

Rendering diagram...

The Four Memory Types#

Memory is not a dumping ground — it uses a strict four-type taxonomy to keep stored knowledge focused and actionable:

Type	What It Stores	Example
user	Role, goals, expertise level, preferences — who the user is and how to tailor responses	"User is a data scientist focused on observability; deep Go expertise but new to React — frame frontend explanations using backend analogues"
feedback	Corrections AND confirmations of approach — both what to stop doing and what to keep doing	"Don't mock the database in tests — prior incident where mocked tests passed but prod migration failed. Confirmed: bundled PRs preferred for refactors in this area"
project	Ongoing work, decisions, deadlines, and incidents not derivable from code or git history	"Auth middleware rewrite is driven by legal/compliance requirements around session token storage, not tech debt — scope decisions should favor compliance over ergonomics"
reference	Pointers to where information lives in external systems	"Pipeline bugs are tracked in Linear project INGEST. Oncall latency dashboard: grafana.internal/d/api-latency"

The critical design constraint: memory stores only what cannot be derived from the current project state. Code patterns, architecture, file paths, git history, and anything in CLAUDE.md are explicitly excluded — those can be read directly from the source. Memory captures the why behind decisions, the who behind preferences, and the where of external resources.

How Recall Works#

When the user submits a query, the system performs semantic recall in four steps:

Scan — read all .md files in the memory directory (up to 200), parsing their YAML frontmatter (name, description, type)
Filter — exclude memories already surfaced in this session (prevents re-surfacing the same memory repeatedly)
Select — send the list of memory descriptions plus the user's query to a lightweight model (Sonnet), which picks up to 5 relevant memories based on name/description match
Inject — read the selected files and inject their content as attachments to the current query, with freshness warnings for memories older than 1 day

The description field in each memory's frontmatter is what the selector reads — it never sees the full content during selection. This means a well-written description is the difference between a memory being found and being invisible. Write descriptions as if you are writing a search result snippet: specific, concrete, and containing the keywords someone would use to look for this information.

What Makes This Different from RAG#

Traditional retrieval-augmented generation (RAG) uses embedding vectors and similarity search over a large corpus. The memory system takes a different approach: structured files with a curated index, recalled by a language model rather than vector similarity. This has several advantages for an agent context:

No embedding infrastructure needed — just markdown files on disk
The selector understands context and intent, not just keyword similarity
Each memory has explicit metadata (type, description) that aids selection
Users can directly read, edit, and delete memory files — full transparency and control
The index (MEMORY.md) provides a human-readable overview of all stored knowledge

The tradeoff is scale: vector search handles millions of documents efficiently, while this system caps at ~200 memories. For a personal coding agent, 200 well-curated memories is more than enough — the bottleneck is quality, not quantity.

How to Build a Minimum Version#

The patterns above can be implemented progressively. The open-source learn-claude-code reference project (docs) implements a coding agent's core from scratch in Python across 12 sessions, each adding one pattern on top of the previous. This progression also serves as a learning roadmap:

Session	What You Build	Key Concept Added
s01	Basic agent loop: messages → LLM → tool_use check → execute → loop	One loop and Bash is enough for a surprising range of tasks
s02	Tool dispatch map: View, Write, Edit, Grep handlers	Adding a tool = adding one entry. The loop itself never changes
s03	TodoWrite + reminder injection after 3 idle iterations	Persistent planning prevents drift in multi-step tasks
s04	Task tool: sub-agent spawning with clean context isolation	Each subtask gets a fresh context; only the summary returns to the parent
s05	Skill loading from the filesystem on demand	Load domain knowledge only when needed; avoid bloating the system prompt upfront
s06	Three-layer context compression	Context will fill up — make room before you run out, not after
s07–s12	Disk persistence, background tasks, agent teams, git worktrees	Production hardening and parallel agent coordination at scale

The minimum viable agent loop — the core built in session 1 — looks like this:

import anthropic

client = anthropic.Anthropic()

def run_agent(task: str, tools: list, tool_handlers: dict) -> str:
    messages = [{"role": "user", "content": task}]

    while True:
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=4096,
            tools=tools,        # list of tool schemas
            messages=messages,
        )

        # Append the assistant's response to the history
        messages.append({"role": "assistant", "content": response.content})

        # If the model produced no tool calls, it is done
        if response.stop_reason != "tool_use":
            return next(b.text for b in response.content if b.type == "text")

        # Execute each tool call and collect the results
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = tool_handlers[block.name](**block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result),
                })

        # Append tool results and loop — the model will be called again
        messages.append({"role": "user", "content": tool_results})

This is approximately 30 lines. Every capability described in this tutorial — file reading, editing, sub-agents, planning — is an extension of this loop. The loop itself never changes.

A practical starting point: Anthropic's claude_agent_sdk package wraps this loop with built-in support for common tool sets and automatic session management, so you can focus on defining your tools rather than wiring up the loop from scratch:

from claude_agent_sdk import Agent, tools

agent = Agent(
    tools=[tools.bash, tools.file_read, tools.file_write, tools.grep],
    system_prompt=(
        "You are a helpful coding assistant. "
        "Always prefer surgical edits over rewriting entire files."
    ),
)

result = agent.run("Add type hints to all functions in src/utils.py and run mypy to verify")

Start here for your first agent. Add custom tools — following the dispatch map pattern — as your specific use case requires.

Summary#

Pattern	What It Solves	Key Constraint to Remember
Master agent loop	Drives all autonomous behavior: the model decides what to do next by calling tools or producing a final answer	Always set a max iteration limit — a stuck model will consume your entire API budget without one
Specialized tool suite	Gives the model safe, scoped actions. Path sandboxing prevents writes outside the project. Risk classification gates destructive bash commands	Tool descriptions are the model's interface — vague descriptions lead to wrong tool selection and incorrect arguments
Tool dispatch map	Makes the system extensible: adding a new capability = adding one handler + one schema entry. The loop never changes	The schema description must accurately reflect what the tool does and when to use it — treat it like public API documentation
Tool lazy loading (ToolSearch)	Saves thousands of context tokens per API call by deferring infrequently-used tool schemas. The model discovers tools on demand via ToolSearch	Core tools (Read, Edit, Bash) must always be available. Only defer tools that are rarely needed — deferring a frequently-used tool adds friction
Context compression pipeline	A cascading pipeline (tool result budget → history snip → microcompact → context collapse → auto-compact) keeps context lean. Cheap stages fire first; full compaction is the last resort. CLAUDE.md persists knowledge across sessions outside the pipeline entirely	Full auto-compact is expensive and lossy — let cheap stages handle routine pressure. CLAUDE.md is the only layer you directly control; keep it concise since it loads on every session start
TodoWrite	Prevents plan drift across many loop iterations. Reminder injection steers the model back to its plan if it gets distracted	Only one task in_progress at a time — parallelism requires spawning sub-agents, not concurrent in-loop planning
Sub-agents and background tasks	Isolates context-heavy work via sub-agents (sync, background, worktree-isolated). Background mode runs tasks concurrently without blocking the user. Coordinator mode enables full multi-agent orchestration for large-scale tasks	Sub-agents cannot spawn their own sub-agents — two levels max. Background tasks add polling complexity; use synchronous mode for anything under 30 seconds
Cross-session memory	Persists user preferences, feedback, project context, and external references across sessions as markdown files on disk. Semantic recall surfaces up to 5 relevant memories per query	Only store what cannot be derived from the current project state. Memory descriptions must be specific and concrete — a poorly described memory is effectively invisible to recall

The lesson from studying coding agents is that powerful agents do not require complex orchestration. They require a reliable loop, well-designed tools (Patterns 1–2), active context management (Pattern 3), structured planning (Pattern 4), the ability to delegate (Pattern 5), and persistence across sessions (Pattern 6). These concerns compound: a reliable loop means predictable debugging; well-specified tools mean fewer model errors; lean context means consistent reasoning across long tasks; memory means the agent improves over time. Start with the loop. Get the tools right. Add compression and planning. Then extend outward to delegation and memory as your use cases demand.

Sources:

PreviousAgent Harness Engineering

NextZero-Trust Security

Patterns in Claude Code

Claude Code's Operating Scope

The Master Agent Loop

Lazy Tool Loading via ToolSearch

The Context Compression Pipeline

TodoWrite: Planning Under Pressure

Sub-Agents and the Task System

Cross-Session Memory Architecture

Arch Advisor