4.4 Context Isolation — Keeping Users' Data Separate
When an AI agent serves a single user in a single session, data boundaries are straightforward: everything in the context belongs to that one user, and nothing leaks anywhere else. The moment you scale to multiple users sharing the same agent infrastructure, however, you introduce a new category of security failure — one that has caused real incidents at companies you likely use every day.
Context isolation means enforcing strict boundaries between different users' data, memory, conversation history, and tool outputs within a shared AI system. Without these boundaries, one user's private information can appear in another user's responses — not through traditional hacking, but simply because the infrastructure meant to keep them separate is missing or misconfigured.
This is not a theoretical concern. In March 2023, OpenAI took ChatGPT offline after a race condition in a Redis client library caused some users to briefly see fragments of other users' chat histories and billing information. In multi-user agentic systems — where the agent also uses tools, retrieves documents, and maintains memory across sessions — the problem is more complex and the potential consequences are more serious.
Context isolation in agentic systems is governed by OWASP LLM02:2025 — Sensitive Information Disclosure, which jumped from #6 to #2 on the OWASP Top 10 for LLM Applications between the 2023 and 2025 editions. This rise reflects both the rapid adoption of multi-user AI features and the growing number of documented incidents.
Why Agentic Systems Make This Harder#
A simple chatbot has only one place where context can leak: the conversation history. An AI agent has many more:
- Conversation history stored between turns
- Long-term memory the agent carries across sessions
- Retrieved documents from a RAG knowledge base
- Tool call results cached for performance
- Inference caches at the model server level
- Application logs that record what the agent saw and did
Each of these is an independent isolation boundary. A failure at any one of them can leak data — even if the other five are perfectly implemented.
Five Common Failure Modes#
Understanding where isolation breaks down is the first step to preventing it. These five patterns appear repeatedly in real incidents.
1. Shared Agent Memory Without Per-User Scoping#
Agent memory systems — such as LangChain memory, Letta, or custom Redis-backed conversation stores — maintain a persistent record of what the agent has seen and done. When a single agent instance is shared across multiple users without scoping that memory to individual users, memories written during one user's session become part of the agent's persistent knowledge and can surface in another user's responses.
What this looks like in practice: A developer builds a personal finance assistant and uses a single agent instance for all users, storing financial goals and account summaries in the agent's shared memory. When User B logs in, the agent recalls User A's account balance and budget targets as if they were User B's own data — because both users share the same memory namespace.
2. Vector Store Cross-User Contamination#
In RAG (Retrieval-Augmented Generation) pipelines, documents are converted into embeddings — numerical representations of their meaning — and stored in a vector database. When a user sends a query, the database finds and returns the most semantically similar documents. If all users' documents are stored in the same flat index with no physical separation, a query from one user can retrieve documents belonging to a different user, simply because those documents happen to be semantically similar to the query.
The important distinction here is between physical isolation (separate namespaces or shards per user) and metadata filtering (all data in one index, with a user_id filter applied at query time). Metadata filtering can work, but it has a critical weakness: if the filter is bypassed — through a bug, a code refactor, or a prompt injection attack — all users' data is exposed at once. Physical namespace isolation limits the damage to a single user's data even if the application layer fails.
3. Shared Inference Cache Timing Side Channel#
When AI applications run on shared infrastructure with prefix caching enabled (a performance optimization that reuses cached computation for identical prompt prefixes), an attacker can measure time-to-first-token (TTFT) latency. A request that hits a cached prefix returns faster than one that misses. By sending carefully crafted queries and measuring the response latency, an attacker can determine whether a specific prefix exists in the cache — and therefore infer something about another user's recent prompt — without ever receiving its content directly.
Research published at NDSS 2025 ("I Know What You Asked: Prompt Leakage via KV-Cache Sharing in Multi-Tenant LLM Serving") demonstrated this attack on shared multi-tenant LLM inference servers. The fix is a per-request cache_salt: a random value injected into the cache key hash so that only requests from the same user (with the same salt) can share cached computation blocks. vLLM added this as a configuration option specifically in response to this research.
4. Session Context Not Fully Reset Between Users#
When an LLM orchestration layer fails to fully reset a session's state between users — whether due to a software bug, a race condition under high load, or a misconfigured in-memory store — fragments of one user's conversation remain active when the next user's session begins. Unlike the cache timing attack, this failure does not require any measurement: the contaminating data appears directly in the next user's responses.
This failure mode is what caused the OpenAI ChatGPT incident of March 2023. A bug in the Redis client used by OpenAI caused concurrent requests to receive each other's cached session data. During a roughly nine-hour window before the service was taken offline, approximately 1.2% of active ChatGPT Plus users could see another user's chat history titles, first name, last name, email address, payment address, and the last four digits and expiration date of their credit card.
5. Shared Application Logs#
Application-level logging writes a record of every user interaction, every tool call, every retrieved document, and every model response. In a multi-user system, if these logs are not scoped per user, they become a single concentration point for all users' sensitive data — accessible to anyone with permission to read the logs.
This is less of a real-time leakage risk (logs generally do not flow between users' active sessions) and more of an accumulation risk: over time, logs collect sensitive data from every user, which can then be exposed through a compromised engineer account, a misconfigured log permissions policy, or a legal subpoena.
Context isolation failure modes and their consequences
| Failure mode | How data leaks | Potential consequence |
|---|---|---|
| Shared agent memory, no user scoping | Memory from User A's session surfaces in User B's context | Private conversation history, financial details, or personal facts disclosed to the wrong user |
| Flat vector namespace, metadata-only filtering | Filter bypass or bug returns User A's documents to User B's query | Private uploaded documents, customer records, or confidential business data retrieved by an unauthorized user |
| Shared inference cache, no cache salt | TTFT timing reveals whether a given prefix exists in cache | Attacker can infer contents of another user's recent prompts without receiving them directly |
| Session state not fully reset | Previous user's context fragments remain active | Another user's messages, documents, or PII appears directly in the current user's responses |
| Raw model outputs written to shared logs | Logs accumulate PII from all users' sessions | A single log access event exposes sensitive data from every user who ever used the application |
Source: OWASP LLM02:2025; Giskard Cross-Session Leak Research; NDSS 2025 KV-Cache Timing Paper
Real-World Incidents#
ChatGPT Redis Bug — March 2023#
On March 20, 2023, OpenAI took ChatGPT offline after discovering a race condition in the redis-py client library. Under concurrent load, the bug caused incorrect cache key assignments, resulting in one user's session data being served to a different user's session. During a roughly nine-hour window before the service was taken down, approximately 1.2% of active ChatGPT Plus users could see:
- Another user's chat history titles and the first message of their conversations
- Another user's first name, last name, email address, and payment address
- The last four digits and expiration date of another user's credit card
The root cause was an infrastructure-level caching failure — nothing malicious or complex. It was simply a missing isolation boundary in a shared cache.
Telemedicine AI Assistant — Patient Data Across Sessions (Research Scenario)#
Giskard, an AI security testing firm, demonstrated a scenario in which a telemedicine clinic's AI assistant cached unredacted patient consultation notes without proper session isolation. In the scenario, a subsequent user who framed a request as an "internal clinical validator" was able to retrieve cached consultation notes from a previous session — including the patient's name, date of birth, diagnosis, medications, lab values, social security number, address, and insurance policy number. The estimated regulatory exposure under HIPAA exceeded $1.5 million in fines.
Although this was a red-team demonstration rather than a confirmed production breach, the root cause it illustrates is the same as the ChatGPT incident: cached context from one session was accessible in a subsequent session because the session boundary was not enforced at the data storage layer.
Hugging Face and Replicate — Cross-Tenant Attacks — April 2024#
Wiz security researchers found that uploading a malicious model to Hugging Face's shared Kubernetes inference infrastructure enabled code execution that could move laterally to access other tenants' models and data. On Replicate, researchers achieved remote code execution that established a connection to a Redis instance shared across tenant boundaries, potentially exposing other customers' model data and API keys.
These incidents demonstrate that context isolation is not just an application-layer concern. The shared inference infrastructure that runs your agents carries the same isolation requirements as your application code.
How to Implement Context Isolation#
Scope All Memory to a Verified Server-Side User ID#
The most fundamental rule: every piece of data that an agent stores or retrieves must be keyed by a server-verified user identifier — never by a value the user supplies.
Server-verified means the user ID comes from your authentication system — a validated JWT claim or a session object set by your auth middleware — not from a query parameter, a cookie value, or anything else the client submits. A client can always submit a different user's ID; your auth system cannot be fooled this way.
Cross-User Memory Contamination
HighAn agent memory store keyed only by a user-supplied username allows any user to read another user's conversation history by supplying a different name.
Memory is stored and retrieved using the username from the request body. A user who supplies a different username receives that user's stored conversation history.
Use Per-User Namespaces in Your Vector Store#
For RAG-powered agents, physical isolation in the vector store is the most reliable approach. The two main patterns are namespaces (Pinecone's term) and per-tenant shards (Weaviate's term). Both achieve the same goal: each user's documents live in a physically separate partition of the index. A query in one namespace cannot retrieve documents from another namespace — the isolation is enforced at the database level, not in your application code.
Metadata-based filtering — storing all users' documents in one index and adding a user_id field to filter at query time — is simpler to set up, but it relies on your application code correctly applying the filter on every single request. If your code fails to apply the filter (due to a bug, a refactor, or a prompt injection attack that bypasses the retrieval parameters), all users' documents are exposed simultaneously.
RAG Retrieval Returning Another User's Documents
HighA shared flat vector index with metadata-only filtering returns private documents belonging to other users when the filter is missing or bypassed.
All users' documents are stored in a single flat Pinecone index. A missing or bypassed user_id filter returns any user's documents to any other user's query.
Use Row-Level Security for Database-Backed Agent Memory#
When your agent's memory, retrieved documents, or tool call results are stored in a relational database (PostgreSQL with pgvector is a common choice for RAG applications), Row-Level Security (RLS) provides isolation enforcement at the database engine level — independent of your application code.
An RLS policy on the document_embeddings table that restricts every SELECT to rows where user_id = current_setting('app.current_user_id') makes it impossible for the application to retrieve another user's rows, even if a bug in the application code omits a WHERE user_id = ? clause. The database rejects the query automatically.
-- Enable RLS on the embeddings table
ALTER TABLE document_embeddings ENABLE ROW LEVEL SECURITY;
ALTER TABLE document_embeddings FORCE ROW LEVEL SECURITY;
-- Policy: each user can only see their own documents
CREATE POLICY user_isolation_policy ON document_embeddings
FOR ALL
USING (user_id = current_setting('app.current_user_id')::uuid);
-- Application sets the session variable at the start of each request
-- (before any queries run)
SET app.current_user_id = 'the-verified-user-uuid';
One important caveat: this pattern requires using session-level variables (SET app.current_user_id = ...) rather than relying on the database username for isolation. When you use a connection pooler like PgBouncer, all requests share the same database role, so the database username provides no per-user isolation. The session variable is set fresh for each request by the application, which already holds the verified user identity.
Scope Per-User Agent Instances in Memory Frameworks#
Agent frameworks that maintain long-term memory (such as Letta, formerly MemGPT) are designed to create one agent instance per user, not one shared agent for all users. When a single agent instance is shared, memories accumulate from all users in the same memory store. The correct pattern is to associate each agent with a specific user identity so that memory reads and writes are automatically scoped to that user.
# ✅ One agent per user, scoped by identity
# Letta example: each user gets their own agent with their own memory
client = create_client()
# Identity ties the agent to a specific user
identity = client.create_identity(
identifier_key=authenticated_user_id, # server-verified ID
name=username,
identity_type="user"
)
# Agent is created (or retrieved) for this specific user
# Memory reads and writes are automatically scoped to this agent instance
agent = client.create_agent(identity_ids=[identity.id])
# Subsequent sessions: retrieve the existing agent for this user
agents = client.list_agents(identity_id=identity.id)
user_agent = agents[0] if agents else client.create_agent(identity_ids=[identity.id])
The key principle is the same as for Redis memory: the scoping identifier must come from your server-side authentication system, not from client input.
Set TTLs on All Cached Session Data#
Any data cached for performance — conversation history, tool call results, retrieved document summaries — should have a time-to-live (TTL) set. TTLs serve two isolation purposes:
-
Prevents stale context from bleeding into future sessions. A session that ended hours ago should not still be retrievable. A TTL guarantees that even if a logic error causes your application to look up the wrong session key, expired data is not returned.
-
Limits the accumulation of sensitive data in caches. Caches that grow without bounds become archives of sensitive information. A 24-hour TTL on conversation history means that if a cache is breached or misconfigured, it exposes at most one day's worth of data per user.
# Set conversation history with a 24-hour TTL
redis_client.setex(
name=f"chat:user:{user_id}:history",
time=86400, # 24 hours in seconds
value=serialized_history
)
# Refresh the TTL on each active use, so active sessions stay alive
# while abandoned sessions expire automatically
redis_client.expire(f"chat:user:{user_id}:history", 86400)
The Confused Deputy in Multi-User Agents#
Section 4.1 introduced the confused deputy problem: an agent holds more permissions than the user interacting with it. In a multi-user system, this problem combines with context isolation to create an additional threat.
Imagine an agent that serves multiple users and holds a service account with access to all users' records. User A sends a message containing an indirect prompt injection — hidden instructions embedded in an uploaded document that tell the agent to retrieve and report User B's records. The agent's context window now contains the injected instruction. If the retrieval layer correctly applies per-user isolation, the instruction fails — the agent can only retrieve User A's documents. But if the retrieval layer relies on a metadata filter, and the injected instruction manipulates those filter parameters, the attack may succeed.
This is why context isolation and Least Privilege (Section 4.1) must be implemented together. Least Privilege limits what the agent can do; context isolation limits which user's data it can access. Both are necessary, and neither is sufficient alone.
With namespace-based physical isolation, the same attack fails at the database level: the agent's retrieval call is physically constrained to User A's namespace, and the injected instruction to retrieve User B's records returns nothing.
Auditing Your Isolation Boundaries#
When reviewing a multi-user agentic system for context isolation issues, work through this checklist:
For every data store the agent reads from or writes to:
- Is access controlled by a server-verified user identity, or by something the client can influence?
- If a developer accidentally omits user scoping in a query, does the database still enforce the boundary (via RLS or namespace isolation), or does the mistake expose all users' data?
- Is there a TTL on cached data, or can stale sessions accumulate and be retrieved indefinitely?
For the RAG retrieval layer:
- Are documents physically isolated per user (separate namespaces or shards), or do all users' documents share the same index?
- Is authorization verified after retrieval? Even with namespace isolation, a retrieved document may contain content the user is not authorized to see (for example, documents at different permission levels within the same user's namespace).
- Can a prompt injection attack in a user's query or uploaded document manipulate the retrieval parameters to bypass the namespace or metadata filter?
For agent memory frameworks:
- Is there one agent instance per user, or is a single agent instance shared across all users?
- If shared, is memory partitioned by user identity at the storage layer?
- Are memory writes scoped to the authenticated user, or can the agent write memories "for" a user based on instructions it received from that user's input?
For application logs:
- Are raw user messages or model responses written to logs? (They should not be, or they should be redacted first.)
- Are logs accessible to all engineers, or scoped to those with a legitimate need?
- Do log entries include sufficient metadata (user ID, session ID, timestamp) to enable security investigations without including the sensitive content itself?
Context isolation implementation patterns by component
| Component | Isolation pattern | What fails without it |
|---|---|---|
| Conversation history (Redis) | Namespaced key: chat:user:{server_verified_id}:history + TTL | Logic bug or race condition serves one user's history to another |
| Vector store / RAG index | One namespace or shard per user (Pinecone namespace, Weaviate tenant shard, pgvector RLS) | Metadata filter bypass or prompt injection retrieves another user's documents |
| Agent long-term memory | One agent instance per user, scoped by verified identity object (Letta identity, LangChain session_id) | Memories from one user's sessions surface in another user's responses |
| Database rows (pgvector, PostgreSQL) | Row-Level Security policy tied to app.current_user_id session variable | Application code bug omitting WHERE clause exposes all users' rows |
| Shared inference cache (vLLM) | Per-request cache_salt derived from verified user ID | TTFT timing side-channel allows prefix reconstruction across user sessions |
| Application logs | Log user ID and session ID only; redact message content and model responses before writing | Logs become an archive of every user's sensitive data, exposed by any log access incident |
Defense-in-depth: apply namespace isolation AND verify the scoping key comes from server-side authentication, not client input.
Summary#
Context isolation is not a single control — it is a discipline applied consistently across every layer of a multi-user agentic system. The principle is the same at every layer: any data the agent stores, retrieves, or caches must be scoped to a server-verified user identity. The user's client must not be able to influence that identity. Isolation must be enforced at the data storage layer — not just in application code — so that an application-layer bug cannot bypass it.
The three most important implementation choices are:
- Physical isolation in your vector store (namespaces or per-tenant shards) rather than metadata-only filtering, so that a filter bypass cannot expose all users' data at once.
- RLS policies on database tables used for agent memory or RAG storage, so that the database engine enforces isolation independently of application code.
- Server-verified user IDs as the single source of truth for all storage keys — never use anything the client submits as an isolation identifier.
Section 4.5 covers the other side of visibility: what your logs should record about the agent's actions, and what signals indicate that isolation boundaries are being probed or breached.
Sources:
- OWASP Top 10 for LLM Applications v2.0 (2025) — LLM02: Sensitive Information Disclosure
- OpenAI: March 20, 2023 ChatGPT Outage Post-Mortem
- Wiz Research: Hugging Face and Replicate Cross-Tenant Vulnerabilities (April 2024)
- I Know What You Asked: Prompt Leakage via KV-Cache Sharing in Multi-Tenant LLM Serving (NDSS 2025)
- Giskard AI — Cross-Session Data Leak Research