3.4 Sensitive Information Disclosure
When you integrate an LLM into your application, sensitive data can leak through paths that do not exist in traditional software. A user's private message can appear in a different user's response. A model can reproduce, word for word, personal data it encountered years ago during training. These are not theoretical risks — they have caused real incidents at major companies.
This is OWASP LLM02:2025 — Sensitive Information Disclosure. It jumped from #6 to #2 on the OWASP Top 10 for LLM Applications between the 2023 and 2025 editions — the largest single ranking leap of any category. This rise reflects both the rapid growth of LLM-powered products and the accumulation of documented incidents.
Sensitive information disclosure has two distinct mechanisms that require different defenses:
- Context window data leakage — the model returns data from its context window (the block of text fed to the model before it generates a response) that belongs to one session, user, or retrieved document, and exposes it to a user who was never meant to see it. This is an application architecture problem.
- Training data memorization — the model reproduces text it was trained on, which may include PII (Personally Identifiable Information), proprietary code, or copyrighted content. This is a fundamental property of how large language models work.
Understanding the difference matters because the causes and defenses are entirely different.
Mechanism 1: Context Window Data Leakage#
Every response an LLM produces is a function of its context window — the accumulated text fed to it before generating the reply. In a single-user, single-turn application this is straightforward. In multi-user or multi-turn systems, the context window becomes a liability if its boundaries are not carefully managed.
Context leakage occurs when data intended for one user or session ends up in another user's context. Three failure modes cause this:
Shared memory without per-user isolation. Some LLM applications maintain a persistent memory store to give the model access to prior conversation history. If that memory is implemented as a single shared store indexed only by a username string — or worse, not isolated at all — a logic error, race condition, or injection attack can cause one user's history to appear in another's context.
Shared RAG retrieval without access controls. RAG (Retrieval-Augmented Generation) is a technique where your application searches a knowledge base for relevant documents and inserts them into the model's context before generating a response. If the retrieval layer does not enforce per-user access controls, a user querying your knowledge base could trigger retrieval of a document that another user uploaded privately. The model will then summarize, quote from, or reference that document in its response.
Shared inference caches. Some deployments cache parts of the model's computation (often called a KV cache or prefix cache) to speed up repeated or similar requests. If these caches are not properly scoped to individual users, a cached result from one user's session can be served to a different user.
The March 2023 OpenAI Redis Bug#
This is not a hypothetical risk. In March 2023, a bug in the Redis client library used by OpenAI caused ChatGPT to briefly expose fragments of other users' chat histories to the wrong users. During a roughly nine-hour window, approximately 1.2% of ChatGPT Plus users active at the time could see titles and the first message of other users' conversations — along with limited billing information (name, email address, and the last four digits of a credit card number). The cause was a race condition in connection pooling: cached data from one user's request was served to a different user's session.
The root cause was an infrastructure-level caching failure, but the lesson applies equally to application-level memory and retrieval: anywhere you cache or store user-specific context and serve it in subsequent requests, a boundary failure leaks private data.
The Samsung Incident: Data Going the Other Way#
The Samsung incidents in April 2023 illustrate context leakage in the other direction: sensitive data entering an LLM's context window and leaving through the provider's infrastructure.
Within a single month, three separate Samsung semiconductor engineers pasted confidential internal data into ChatGPT prompts:
- One engineer pasted proprietary source code and asked it to find a bug.
- A second pasted a different section of internal source code to request optimization suggestions.
- A third pasted transcripts of internal meetings.
All of this data was sent to and processed by OpenAI's servers. Samsung subsequently banned all generative AI tools company-wide and capped prompt lengths at 1,024 bytes as an emergency measure to limit further exposure.
This case illustrates that context leakage is not just a technical architecture problem — it is also a user behavior problem. 13% of all enterprise GenAI prompts contain sensitive organizational data, according to Lasso Security research. Developers building internal AI tools need to design both the system architecture and the user experience to minimize how much sensitive data enters the model context in the first place.
Cross-User Context Leakage via Shared Memory
HighA multi-user AI assistant retrieves conversation history from a shared store without per-user isolation, exposing one user's data to another.
Memory is stored and retrieved using only the username as a key. If a logic error, injection attack, or caching bug causes the wrong username to be used in a lookup, another user's data appears in the context.
Mechanism 2: Training Data Memorization#
Large language models do not store training examples the way a database stores rows. Instead, they learn by compressing patterns from billions of documents into billions of numerical parameters — called weights — during training. However, this compression is imperfect: sequences that appear repeatedly in the training data can be encoded in a way that allows the model to reproduce them nearly word for word when prompted in the right way.
This phenomenon is called training data memorization, and it has been systematically studied by academic researchers.
What the Research Shows#
Nicholas Carlini and colleagues (2021) demonstrated the first practical training data extraction attack on GPT-2. By querying the model with strategically crafted prefixes, they extracted hundreds of verbatim sequences from GPT-2's training data — including real people's names, phone numbers, email addresses, IRC chat logs, code snippets, and 128-bit UUIDs. The work was published at USENIX Security 2021. Critically, larger models were consistently more vulnerable than smaller ones: as a model gains capacity, it memorizes more.
The same team (2023) extended this work to production LLMs including ChatGPT. Using a "divergence attack" — prompting the model to repeat a single word indefinitely — they discovered that ChatGPT would eventually abandon its safety-tuned behavior and begin outputting raw text from its pre-training data. In this mode, the model reproduced memorized training sequences at a rate roughly 150 times higher than during normal behavior. Using approximately $200 USD in API calls, they extracted several megabytes of training data from ChatGPT, including names, addresses, phone numbers, fax numbers, and other PII.
Further memorization rates:
- GPT-J memorized at least 1% of its training set
- StarCoder memorized at least 8% of sampled training examples
- For GPT-J, email addresses could be extracted correctly 16% of the time
- Fine-tuning amplification: fine-tuning a model on repeated sensitive data raises memorization rates from a baseline of 0–5% to 60–75% — a 64% average increase
These findings mean that if your organization fine-tunes an LLM on internal data — customer records, legal documents, proprietary code — and that data contains PII or confidential content, the fine-tuned model may reproduce fragments of that data to users who ask the right questions, even if those users were never meant to have access.
What Kinds of Data Get Memorized#
The probability of memorization increases significantly when:
- A sequence appears multiple times in the training data (deduplication reduces memorization by roughly 10x)
- The sequence is long (memorization probability increases with sequence length)
- The sequence follows a distinctive pattern (names paired with phone numbers, email addresses in a recognizable format)
Models trained on public internet data have been shown to reproduce:
- Full names with associated addresses and phone numbers
- Email addresses from publicly accessible mailing lists
- Verbatim passages from books — researchers demonstrated GPT models reproducing the opening pages of Harry Potter novels and Dr. Seuss books
- Source code from public GitHub repositories, including unique function and variable names
- UUIDs, cryptographic keys, and other high-entropy strings that appeared in training data
Types of data at risk and their disclosure mechanisms
| Data type | Memorization risk | Context leakage risk | Primary concern |
|---|---|---|---|
| User PII (names, emails, phone numbers) | Medium (if in training data) | High (if in session memory or RAG) | Cross-user leakage in multi-user systems |
| Health / medical data | Low (rarely in public training data) | High (if users share symptoms, diagnoses) | HIPAA exposure via context leakage |
| Internal source code | High (if fine-tuned on internal codebase) | High (if pasted into prompts) | IP leakage; Samsung-style incidents |
| API keys and credentials | Low-medium (if present in public code) | Very high (if in system prompt or .env) | Direct exploitation of leaked key |
| Proprietary business documents | High (if used in fine-tuning) | High (if in RAG knowledge base) | Unauthorized access to confidential data |
| Copyrighted content | High (if heavily represented in training) | N/A | Legal exposure from verbatim reproduction |
Risk levels are relative and depend on how much of each data type appeared in training and how it is handled at runtime.
Fine-Tuning on Sensitive Data#
The most actionable memorization risk for application developers is fine-tuning — the process of continuing to train a pre-built model on your own dataset to specialize its responses for your domain. When you fine-tune on proprietary internal data, you are effectively teaching the model to produce text that resembles your internal documents — which makes it more likely to reproduce fragments of those documents verbatim.
This is especially dangerous when the fine-tuning data contains:
- Customer records with names, account numbers, or transaction histories
- Legal documents with privileged or confidential content
- Employee data, compensation information, or HR records
- Internal security documentation (network diagrams, vulnerability reports)
The mitigation is not to avoid fine-tuning altogether, but to treat fine-tuning data with the same access controls you apply to the data itself, and to scrub PII before it enters any training pipeline.
Practical Defenses#
For Context Window Leakage#
The defenses here are primarily architectural:
Per-user context isolation. Every piece of data that enters an LLM's context must be scoped to the user who is authorized to see it. In practice this means:
- Namespace all memory store keys with a verified server-side user ID
- Apply access control checks at the RAG retrieval layer, not just at the point where a user makes a request
- Never share inference caches across users without user-specific isolation
Minimize what enters the context. The less sensitive data you put into the LLM's context, the less damage a leakage incident causes:
- Retrieve only what is necessary for the current query (top-3 documents, not the entire knowledge base)
- Load only recent conversation history, not every message from the beginning of the session
- Use anonymized or tokenized identifiers in prompts instead of real names or account numbers where possible
PII scanning on outputs before logging. AI responses should be scanned for PII patterns before they are written to application logs. Logging raw AI outputs means PII that entered the model context (through user messages or retrieved documents) gets written to log files, which are often stored for months and accessible to many engineers.
For Training Data Memorization#
Scrub PII from fine-tuning data. Before using any internal dataset for fine-tuning, apply a PII detection pipeline to find and remove or anonymize personal information. Tools like Presidio (Microsoft, open source), scrubadub, and commercial data loss prevention (DLP) services can detect common PII patterns at scale.
Deduplicate your training data. Research shows that training data deduplication reduces memorization rates by approximately 10x. If a name or document appears only once in the training set instead of hundreds of times, the model is far less likely to reproduce it.
Apply output filtering in production. Even without fine-tuning, base model memorization is a real risk. Apply output filters that detect PII patterns (email addresses, phone numbers, SSNs, credit card numbers) in the model's response before it is returned to the user.
PII Leakage Through Missing Output Filtering
HighAn AI assistant returns a model response containing PII from training data memorization or context window data without any output scanning.
The model response is returned directly to the user with no output filtering. PII from training data memorization or context leakage passes through undetected.
The Logging Trap#
One of the most common ways sensitive data accumulates silently is through application logging. A typical AI application logs:
- The user's input message (which may contain PII the user typed)
- The model's output (which may contain PII from memorization or context leakage)
- The full context sent to the model (which may contain retrieved documents with PII from the knowledge base)
Over months of operation, these logs become a concentrated archive of sensitive information — often stored in cloud logging services, accessible to every engineer with log read permissions, and retained for 90 days or more by default. A single compromised engineer account or a misconfigured log permissions policy can expose the sensitive data of every user who has ever used the application.
The rule for logging AI applications:
- Log user IDs and session IDs, not user message content
- Log tool call names and whether they succeeded, not their full parameters
- Log PII-scanned and redacted versions of model outputs, not raw outputs
- Never log the full context window — it can contain retrieved documents with PII from your entire knowledge base
This connects directly to Chapter 4 (§4.5 Monitoring and Observability), which covers what to log and what not to log in agentic systems in more detail.
Summary: What to Do#
Defenses by risk type
| Risk | Root cause | Primary defenses |
|---|---|---|
| Cross-user context leakage | Missing per-user isolation in memory, cache, or retrieval layers | Namespace by verified user ID; access controls at the retrieval layer; minimize history loaded into context |
| Training data memorization | Sensitive content encoded in model weights during training | Scrub and deduplicate fine-tuning data; apply output PII scanning; use anonymized identifiers in prompts |
| PII accumulation in logs | Raw AI inputs/outputs written to application logs | Log only redacted outputs; log user IDs not message content; apply PII scanning before writing |
| Sensitive data in prompts | Users or developers submit confidential data to the model | User education; prompt design that minimizes PII entry; DLP scanning on inputs |
The checklist item from Chapter 5 that applies here: Does the AI context contain real user PII? If yes, is there a documented justification and a data handling policy? If you cannot answer this question for your application, that is the first thing to address.
Section 3.5 covers a different kind of cost entirely: attacks that target your billing account instead of your data.
Sources:
- OWASP Top 10 for LLM Applications — LLM02:2025 Sensitive Information Disclosure
- Extracting Training Data from Large Language Models (Carlini et al., USENIX Security 2021)
- Scalable Extraction of Training Data from (Production) Language Models (Nasr, Carlini et al., 2023)
- March 20 ChatGPT Outage: Here's What Happened (OpenAI, 2023)