4.5 Monitoring and Observability
The previous sections in this chapter covered how to limit what an agent can do (Least Privilege), how to require human confirmation before high-risk actions (Human-in-the-Loop), how to verify identity in agent-to-agent communication (Multi-Agent Trust), and how to keep each user's data isolated from others (Context Isolation). All of these are prevention controls — they aim to stop problems before they happen. Monitoring is the detection layer: the set of signals that tell you when something is going wrong, or has already gone wrong, so you can respond before the damage compounds.
For traditional software, monitoring is mostly about uptime and latency. For AI agents, it serves a different purpose: because agents take real-world actions on behalf of users, the question is not just "is the service up?" but "did the agent do something it was not supposed to do?" Answering that question requires a different kind of visibility.
Why detection matters even when prevention is in place: A well-configured agent with Least Privilege, HITL controls, and context isolation is much harder to exploit — but not impossible. A novel prompt injection technique, a logic error in tier classification, or a compromised dependency can bypass any single prevention control. Monitoring is what catches these cases. The goal is to detect and contain an incident before it escalates, not to assume that the prevention layer is infallible.
Why AI Agent Monitoring Is Different#
Traditional application monitoring watches for infrastructure events: service down, database slow, CPU spiked, error rate exceeded. These signals are clear-cut — a 500 error is unambiguous, a deployment rollback is visible in metrics, and a crash produces a stack trace.
AI agent failures are often silent and semantic. "Silent" means no exception is raised and no error code is returned — the system looks healthy at the infrastructure level. "Semantic" means the failure is about meaning: the agent did something it was not supposed to do, even though every individual request technically succeeded. When a prompt injection attack redirects an agent to exfiltrate data, no exception is thrown. When a Denial-of-Wallet attack drains your API budget, the service keeps running normally. When an agent stuck in an error loop makes the same failing tool call hundreds of times, each individual request looks fine at the HTTP level.
Traditional software monitoring vs. AI agent monitoring
| Dimension | Traditional Software | AI Agents |
|---|---|---|
| Failure signal | Hard errors: exceptions, 500 responses, null pointers — visible in metrics immediately | Silent failures: prompt injection, behavioral drift, and hallucinations produce no infrastructure alert |
| What a trace represents | A chain of function and service calls with deterministic outputs | A reasoning path including retrieved documents, tool outputs, and model decisions — probabilistic |
| What metrics measure | Latency, error rate, throughput, uptime | Token usage, cost per session, tool call frequency, anomalous action sequences |
| Debugging approach | Reproduce the exact error from a stack trace | Reconstruct the full context: input, system prompt, retrieved documents, memory state, tool call sequence |
| Key new threat | Not applicable | Denial of Wallet: the service stays up while costs spiral — invisible to uptime monitoring |
| Behavioral baseline | Deterministic: same input always produces the same output | Probabilistic: normal output varies, so anomaly detection must work on patterns, not exact matches |
AI observability extends the traditional logs/metrics/traces model with three new pillars: cost attribution (tracking spend per user and session), behavioral baselines (what tool call sequences are normal for this agent), and governance (did the agent stay within its defined scope).
This difference has a practical consequence: you cannot rely on your existing application monitoring to cover AI agents. You need to instrument agent actions specifically and alert on patterns that have no equivalent in traditional software.
The Two Sides of Agent Visibility#
Before implementing monitoring, it helps to understand what you are trying to achieve. Agent visibility has two competing requirements that must be balanced:
You need enough detail to detect and investigate incidents. When an agent takes an unexpected action, you need to reconstruct what it was told, which tools it called, with what parameters, in what sequence, and whether those calls succeeded. Without this level of detail, incidents are undetectable and impossible to investigate.
You cannot log everything the agent processes. Agent contexts routinely contain user messages, retrieved documents, tool call results, and model responses — all of which may include personally identifiable information (PII), credentials, financial data, medical records, or confidential business content. Logging this verbatim creates a concentrated archive of sensitive data that becomes a high-value target. A single breach of an unredacted agent log store can expose every user's sensitive content from every session.
The solution is to separate the audit trail from the content. The audit trail records what happened: which tools were called, with what parameter shapes, by which session, at what time, and whether they succeeded. Sensitive content — the actual text of messages, the full body of tool arguments, and the model's exact response — is either not logged at all or stored separately in an access-controlled store and referenced by an ID in the audit record.
What to Log#
The following fields should be captured for every agent action. Together, they give you enough information to detect anomalies, investigate incidents, and satisfy audit requirements — without needing to log full message content.
Per-Action Fields#
These fields describe a single tool call or agent operation:
Required audit fields for every agent action
| Field | Example value | Why it matters |
|---|---|---|
event_type | tool_call, retrieval, llm_request | Categorizes the action for filtering and alerting |
tool_name | send_email, FileEdit, delete_record | Identifies what the agent did — the single most important field for anomaly detection |
tool_call_id | call_a3b7c9 | Links the tool call to its result; enables reconstruction of multi-step sequences |
session_id | sess-abc123 | Groups all actions within one conversation; essential for per-session anomaly detection |
user_id | usr-789 | Server-verified user identity (see §4.4) — never a value submitted by the client |
agent_id | agent-support-v2 | Identifies which agent configuration took the action; important in multi-agent systems |
timestamp | 2026-04-06T12:00:00Z | ISO 8601 UTC timestamp; required for timeline reconstruction and rate calculations |
result | success, failure, denied | Outcome of the action; repeated failures on the same tool are an early warning signal |
error_type | permission_denied, timeout | Classifies failures; useful for distinguishing configuration errors from injection loops |
input_tokens | 1200 | Token count for the request; tracks cost and detects unusually large inputs |
output_tokens | 340 | Token count for the response; a spike without a corresponding input spike may indicate a Denial-of-Wallet attempt |
cost_usd | 0.0045 | Cost of this call; enables per-user and per-session budget tracking |
content_ref | content-xyz | Reference ID pointing to the full content stored separately — never include the content itself in the audit log |
These fields follow the OpenTelemetry GenAI semantic conventions (v1.37+). Using standard attribute names (gen_ai.tool.name, gen_ai.operation.name, gen_ai.usage.input_tokens) lets you route logs to any compatible observability platform — Datadog, Grafana, Splunk — without rewriting your instrumentation.
At the Session Level#
Beyond individual actions, log these fields at the session level to support aggregate analysis:
- Total tool calls in the session: an unusually high count signals a possible injection loop
- Unique tool types called: an agent that normally only calls read tools but starts calling write and delete tools is behaving outside its baseline
- Data sources accessed (by ID, not by content): for RAG-enabled agents, which knowledge bases or namespaces were queried
- Multi-agent delegation chain: if this agent was invoked by another agent, record the calling agent's ID — this trace is essential for investigating multi-agent incidents (§4.3)
- Human-in-the-loop decisions: any Tier 3 approval or denial from §4.2, with the approving user's ID and timestamp
Structured Log Format#
The following shows what a well-formed audit log entry looks like. This is what goes into your log store — metadata only, with sensitive content stored separately:
{
"timestamp": "2026-04-06T12:00:00Z",
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"event_type": "tool_call",
"agent_id": "agent-support-v2",
"session_id": "sess-abc123",
"user_id": "usr-789",
"tool_name": "send_email",
"tool_call_id": "call_a3b7c9",
"result": "pending_approval",
"risk_tier": 3,
"input_tokens": 1200,
"output_tokens": 340,
"cost_usd": 0.0045,
"content_ref": "content-xyz"
}
Notice what is absent: the email address, subject line, body, and any other content the agent processed. The content_ref field points to the separately stored content record for investigators who need it, but most operational queries — anomaly detection, cost tracking, compliance reporting — can be answered from metadata alone.
Here is a minimal Python implementation using the standard logging module with structured output:
import logging
import json
from datetime import datetime, timezone
from typing import Optional
def log_agent_action(
tool_name: str,
tool_call_id: str,
session_id: str,
user_id: str,
agent_id: str,
result: str,
risk_tier: int,
input_tokens: int,
output_tokens: int,
cost_usd: float,
error_type: Optional[str] = None,
content_ref: Optional[str] = None,
) -> None:
"""
Emit a structured audit log entry for one agent tool call.
- content_ref points to separately stored content (never log content here)
- user_id must come from server-side authentication, not client input
- All timestamps are UTC ISO 8601
"""
entry = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"event_type": "tool_call",
"agent_id": agent_id,
"session_id": session_id,
"user_id": user_id, # server-verified — see §4.4
"tool_name": tool_name,
"tool_call_id": tool_call_id,
"result": result,
"risk_tier": risk_tier,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"cost_usd": cost_usd,
}
# Only include optional fields when they have a value
if error_type:
entry["error_type"] = error_type
if content_ref:
entry["content_ref"] = content_ref
logging.info(json.dumps(entry))
What NOT to Log#
Knowing what to exclude is just as important as knowing what to include. The fields below must never appear in plaintext in your audit logs.
Fields that must never appear in plaintext audit logs
| What to exclude | Why | Safe alternative |
|---|---|---|
| Full user message content | May contain PII, financial data, health information, or confidential business details — logging it verbatim creates a concentrated sensitive data archive | Store in a separate access-controlled store; reference by ID in the audit record |
| Model responses (full text) | AI output often reflects the user's input — it carries the same PII risks, plus it may include confidential retrieved content | Same approach: access-controlled content store, not inline in audit logs |
| Tool call arguments (full body) | Arguments to tools like send_email, database_query, or FileEdit contain the actual sensitive data the agent is acting on | Log the tool name and a reference ID; store full arguments separately |
| API keys and tokens | Credentials in logs are a common breach vector — if an attacker gains access to your log store, all logged credentials are immediately compromised | Never log — strip them before write time using a redaction pipeline |
| Passwords (plaintext) | Plaintext passwords in any log are a critical vulnerability | Never log — if an agent ever processes a password, redact it before logging |
| Session tokens and JWTs | A valid session token found in a log file can be replayed to impersonate the user it belongs to | Log only a non-replayable session ID, not the token itself |
| Personal identifiers (SSN, card numbers) | These are highly regulated data categories under GDPR, HIPAA, and PCI-DSS — their unredacted presence in logs triggers breach notification obligations | Redact with a fixed placeholder ([REDACTED]) or a tokenized reference before write time |
The OpenTelemetry GenAI spec marks gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions, and gen_ai.tool.call.arguments as opt-in only, with an explicit warning that they SHOULD NOT be captured by default in production.
The safest approach is a redaction pipeline that runs before any log write. Rather than relying on every developer to remember which fields to exclude, route all log output through a filter that strips known-sensitive patterns automatically:
import re
import json
# Patterns that indicate credential or PII data
REDACT_PATTERNS = [
(re.compile(r'"password"\s*:\s*"[^"]*"'), '"password": "[REDACTED]"'),
(re.compile(r'"api_key"\s*:\s*"[^"]*"'), '"api_key": "[REDACTED]"'),
(re.compile(r'"token"\s*:\s*"[^"]*"'), '"token": "[REDACTED]"'),
(re.compile(r'"authorization"\s*:\s*"[^"]*"', re.IGNORECASE), '"authorization": "[REDACTED]"'),
# Credit card: 13–16 digits
(re.compile(r'\b\d{13,16}\b'), '[CARD-REDACTED]'),
# US SSN
(re.compile(r'\b\d{3}-\d{2}-\d{4}\b'), '[SSN-REDACTED]'),
]
def redact_log_entry(raw: str) -> str:
"""
Apply redaction patterns to a serialized log entry string.
Called before the entry is written to any log destination.
This is a last-resort safety net — it should not be the only
defense. The primary defense is never constructing log entries
that contain sensitive fields in the first place.
"""
result = raw
for pattern, replacement in REDACT_PATTERNS:
result = pattern.sub(replacement, result)
return result
Logging Sensitive Content in Agent Audit Trails
HighAn agent that logs full tool call arguments, user messages, or model responses into a shared audit store exposes every user's sensitive data to anyone with log read access — and to any attacker who breaches the log store.
The log entry includes the full tool arguments — including the email body and any personal data it contains. Every engineer with log access can read every user's email content.
Early Warning Signals#
Monitoring is only useful if you know what to alert on. The four signals below cover the most common AI agent abuse patterns. Each one corresponds to a specific threat covered elsewhere in this chapter.
1. Unusual Spike in Tool Calls Per Session#
What it signals: A session making far more tool calls than normal is a strong indicator of a prompt injection loop — a successful injection that causes the agent to take repeated actions, often trying to reach a goal the attacker specified rather than what the user actually asked for. In June 2025, a vulnerability in Microsoft 365 Copilot (CVE-2025-32711) showed how a single crafted email, ingested during routine summarization, could cause Copilot to autonomously call tools to access OneDrive, SharePoint, and Teams files in sequence — an action chain that would have been visible as an abnormal tool call spike if a behavioral baseline had been in place.
How to detect it: Establish a baseline for tool calls per session for each agent type. Alert when a session exceeds a multiple of the normal 95th percentile (p95 — the value that 95% of sessions fall below). For example, a customer support agent that normally makes 3–5 tool calls per session should trigger an investigation if a single session reaches 30.
def check_tool_call_rate(session_id: str, agent_id: str) -> bool:
"""
Returns True if this session's tool call count is anomalously high.
Compares against the p95 for this agent type over the past 7 days.
"""
current_count = audit_store.count_tool_calls(session_id)
baseline_p95 = metrics_store.get_p95_tool_calls_per_session(
agent_id=agent_id,
lookback_days=7
)
# Alert if the session is more than 5x the p95 baseline
return current_count > (baseline_p95 * 5)
2. AI API Cost Increasing Without a Corresponding Traffic Increase#
What it signals: A Denial-of-Wallet attack (covered in §3.5) drives up your AI API spending without taking the service offline. Traditional request-count rate limits offer no protection here: an attacker using crafted inputs to maximize token usage can stay well within the request-per-minute limit while spending 100x the normal cost per request. The first sign is almost always a billing alert, not a latency or error rate alert — by which point significant spend has already occurred.
How to detect it early: Track cost per request (calculated from token counts and model pricing), cost per session, and per-user daily spend alongside request volume. Alert when the 99th percentile (p99) cost-per-request is more than 10x the median (p50), or when per-user daily spend exceeds a defined budget threshold. A per-user token budget — a hard cap on tokens consumed per hour or per day — is the most effective preventive control.
COST_PER_1M_INPUT_TOKENS = 3.00 # example pricing, USD
COST_PER_1M_OUTPUT_TOKENS = 15.00 # example pricing, USD
MAX_TOKENS_PER_USER_PER_HOUR = 100_000
def calculate_and_check_cost(
user_id: str,
input_tokens: int,
output_tokens: int
) -> dict:
"""
Calculate the cost of this request and check whether the user
has exceeded their hourly token budget.
"""
cost_usd = (
(input_tokens / 1_000_000) * COST_PER_1M_INPUT_TOKENS +
(output_tokens / 1_000_000) * COST_PER_1M_OUTPUT_TOKENS
)
# Increment the user's running hourly token total
hourly_total = token_budget.increment(user_id, input_tokens + output_tokens)
if hourly_total > MAX_TOKENS_PER_USER_PER_HOUR:
# Return a 429 with cost-specific headers, not just rate-limit headers
raise BudgetExceededError(
user_id=user_id,
consumed=hourly_total,
limit=MAX_TOKENS_PER_USER_PER_HOUR,
cost_usd=cost_usd
)
return {"cost_usd": cost_usd, "hourly_tokens_used": hourly_total}
3. Tool Calls to Resources Outside the Agent's Expected Scope#
What it signals: An agent calling tools or accessing data sources that are not part of its normal workflow is likely under the influence of a prompt injection attack — the injected instruction is directing the agent toward a resource it should have no reason to touch. A customer support agent accessing infrastructure configuration files, or a documentation assistant calling a send_email tool it has never used in 10,000 sessions, are both examples of scope-breaking behavior that warrants immediate investigation.
How to detect it: For each agent, define its expected tool set and expected data sources. Alert whenever a tool is called that is not on the expected list, or when a retrieval query targets a namespace that does not belong to the requesting user's scope (a cross-user access attempt, as discussed in §4.4).
# Define the expected tool set for each agent type
AGENT_EXPECTED_TOOLS: dict[str, set[str]] = {
"customer-support-v2": {"get_order_status", "lookup_policy", "create_support_note"},
"documentation-assistant": {"search_docs", "get_page_content"},
"code-review-agent": {"read_file", "list_directory", "search_codebase"},
}
def check_tool_scope(agent_id: str, tool_name: str, session_id: str) -> None:
"""
Alert if the agent is calling a tool outside its defined scope.
This should never happen in normal operation and warrants investigation.
"""
expected = AGENT_EXPECTED_TOOLS.get(agent_id, set())
if tool_name not in expected:
alert_security_team(
severity="HIGH",
message=(
f"Agent '{agent_id}' called unexpected tool '{tool_name}' "
f"in session '{session_id}'. Possible prompt injection."
)
)
4. Repeated Failures on the Same Action#
What it signals: An agent that fails on the same tool call repeatedly — especially if it keeps retrying with only minor parameter variations — is stuck in an error loop. This can happen because a prompt injection has directed the agent toward a goal it cannot reach with its available tools (so it keeps trying different approaches), or because the agent is hallucinating a solution that does not actually work. Either way, a sustained pattern of same-tool failures is a sign that something unexpected has taken control of the agent's goal.
How to detect it: Track consecutive failure counts per tool per session. Alert when the same tool fails more than a defined number of consecutive times in one session. For irreversible actions in particular, block further retries automatically rather than waiting for a human to act on the alert.
MAX_CONSECUTIVE_FAILURES_PER_TOOL = 3
def record_tool_result(
session_id: str,
tool_name: str,
result: str # "success" or "failure"
) -> None:
"""
Track consecutive failures for this tool in this session.
Alert and optionally block when the threshold is exceeded.
"""
failure_count = failure_tracker.increment_if_failure(
session_id=session_id,
tool_name=tool_name,
result=result
)
if failure_count >= MAX_CONSECUTIVE_FAILURES_PER_TOOL:
alert_security_team(
severity="MEDIUM",
message=(
f"Tool '{tool_name}' failed {failure_count} consecutive times "
f"in session '{session_id}'. Possible injection loop."
)
)
# For irreversible tools, block further calls automatically
if is_irreversible_tool(tool_name):
session_store.block_tool_for_session(session_id, tool_name)
Alert Thresholds Reference#
The table below summarizes the four early warning signals with recommended alert thresholds. These are starting points — adjust them based on your agent's actual baseline behavior.
Early warning signals and alert thresholds
| Signal | What it indicates | Recommended threshold | Response |
|---|---|---|---|
| Tool calls per session > 5x p95 baseline | Prompt injection loop; agent directed toward attacker's goal | Varies by agent type; establish a per-agent baseline over 7 days | Flag for human review; consider auto-suspending the session |
| Cost-per-request p99/p50 ratio > 10x | Inputs crafted to maximize token consumption (Denial of Wallet) | Monitor ratio continuously; alert when it exceeds 10x over a 5-minute window | Enforce per-user token budget; rate-limit by token count, not request count |
| Per-user daily spend > configured budget | Sustained Denial of Wallet or leaked API key abuse | Set a per-user daily budget at 3–5x average daily spend | Return 429 with cost-specific headers; notify user and ops team |
| Tool call outside agent's expected tool set | Prompt injection redirecting agent to an unexpected capability | Any occurrence is anomalous — zero expected occurrences in normal operation | Immediate alert; investigate session; review retrieved content for injection |
| Same tool fails 3+ consecutive times in one session | Error loop from injection or hallucination | 3 consecutive failures on the same tool | Alert ops team; auto-block further calls to that tool in this session for irreversible tools |
| Retrieval query targeting another user's namespace | Cross-user data access attempt (§4.4) | Any occurrence — should be impossible with proper namespace isolation | Critical alert; investigate for injection or isolation configuration bug |
| Credentials or secret patterns in model output | System prompt leakage or injection causing credential exposure | Any regex match on API key patterns, private key headers, or password field patterns in output | Block output delivery; critical alert; rotate the exposed credential |
Alert thresholds should be tuned to your agent's normal behavior. A threshold that is too low generates alert fatigue (§4.2); too high, and real incidents go undetected. Start conservative and adjust after two weeks of baseline data.
Putting It Together: A Minimal Monitoring Stack#
For most applications, the core monitoring requirements can be met with three components, as illustrated below:
Audit log store: Any structured log destination works — CloudWatch Logs, Datadog, Grafana Loki, or even a PostgreSQL table for smaller deployments. The key requirement is that log entries are queryable by session_id, user_id, tool_name, and timestamp, so you can aggregate tool call counts per session and calculate per-user costs.
Content store: A separate storage system for full tool arguments, model responses, and user messages. An S3 bucket or equivalent object storage with server-side encryption and access policies tied to a security team role works well. The audit log entry references content here by ID; the content store itself is not queried during normal operations monitoring.
Alerting: Route critical signals — tool calls outside scope, credentials in output, cross-user access attempts — to your security team immediately. Route medium signals — session tool call spikes, cost anomalies, error loops — to an operations channel where someone reviews them within the hour. Reserve the security team channel for signals that indicate an active attack in progress. Too many alerts there will cause alert fatigue, and real incidents can get missed.
The Relationship Between Monitoring and the Other Controls#
Monitoring does not replace the prevention controls in §4.1–4.4 — it works alongside them. Understanding how they interact helps you prioritize gaps.
Least Privilege (§4.1) limits the blast radius; monitoring detects when the boundary is being tested. If your customer support agent does not have delete permissions, an attacker who injects a delete instruction will see their calls fail. Those failures appear as repeated same-tool failures in your monitoring — giving you a signal even though the attack did not succeed.
HITL controls (§4.2) pause high-risk actions; monitoring records the reasoning. When a Tier 3 action is approved or denied, the audit log captures who made that decision and when. If an approved action later turns out to have been triggered by an injection, the audit trail gives you the chain of events to investigate.
Context isolation (§4.4) prevents cross-user data access; monitoring detects the attempts. With proper namespace isolation, a cross-user retrieval query fails at the database level. Your monitoring should still log the attempt — a pattern of cross-namespace queries from one session is evidence of either a configuration bug or an active probe for isolation weaknesses.
Summary#
Monitoring for AI agents is built on one principle: separate what happened (audit metadata, logged broadly) from what was processed (content, stored narrowly with access controls). The audit trail gives you the signals you need to detect and investigate incidents; keeping content out of the audit trail prevents the log store from becoming a sensitive data liability.
The four early warning signals to alert on:
- Tool call volume spike — a session making far more tool calls than its baseline is likely under the influence of a prompt injection
- Cost anomaly without a traffic change — the first sign of a Denial-of-Wallet attack is usually a billing alert, not a performance alert
- Tool calls outside expected scope — any tool call that does not match the agent's defined capability set is anomalous by definition
- Repeated failures on the same action — an agent retrying the same failing tool call is stuck in a loop, whether from injection or hallucination
Together with the prevention controls in §4.1–4.4, these monitoring signals complete the security posture for a production agentic system. Chapter 5 consolidates all of these controls into a single developer checklist organized by workflow stage.
Sources: