2.6 Secret Leakage from Context Windows

This section covers one of the most frequently reported security incidents involving AI coding agents: live credentials ending up committed to version control or transmitted to AI provider servers. Two separate mechanisms cause this, and both share the same root cause — how AI agents load files into their context window (the complete set of files and text the AI can "see" when generating code).

Understanding both mechanisms is the first step toward preventing either one.

How AI Agents Load Your Files#

Modern AI coding agents (Claude Code, Cursor, Codex) do not just see the file you are currently editing. They automatically load surrounding files to generate more relevant code. This typically includes:

  • The file you are working on
  • Files in the same directory
  • Project configuration files (package.json, pyproject.toml, etc.)
  • Environment files they are configured to read (.env, .env.local)
  • Editor configuration files (.cursorrules, MCP config files)

This automatic loading is a useful feature — it makes the AI's suggestions more contextually relevant. The problem is that .env files contain your live credentials, and the AI loads them without any warning.

How Credentials Travel from .env to Source Code
Rendering diagram...

When the AI generates a database connection function, it has already seen the actual credential values from your .env file. Instead of writing os.environ["DB_PASSWORD"] — a reference to the variable — it may write the literal value s3cr3t directly into the generated code, because that is the concrete value it has available in context.

The developer reviews the generated code, sees that it works, and commits it. The credential is now permanently in version control.

The Evidence#

The scale of this problem is well documented:

  • GitGuardian's State of Secrets Sprawl 2026 report found that AI-assisted commits leak credentials at roughly twice the GitHub-wide baseline rate — Claude Code specifically showed a 3.2% secret-leak rate versus a 1.5% baseline across all public commits
  • In 2025, AI service secrets detected on public GitHub reached 1,275,105 — an 81% year-over-year increase, driven largely by AI coding tools embedding API keys into committed code
  • GitGuardian found 24,008 unique secrets in publicly accessible MCP (Model Context Protocol) configuration files — of those, 2,117 were confirmed live credentials, valid keys sitting exposed in public repositories
  • Real-world incidents have confirmed that AI agents commit .env file contents directly into repository history

What Gets Leaked#

The most common secrets found in AI-assisted commits:

  • Database passwords — from .env files loaded into agent context
  • API keys — Stripe, OpenAI, AWS, Twilio, SendGrid, and others
  • Connection strings — a single string that bundles hostname, username, and password together
  • MCP configuration credentials — service tokens used by AI agent tool integrations
  • OAuth client secrets — from provider setup documentation pasted into prompts

Why Git History Makes This Permanent#

When a credential is committed to a git repository, deleting the file or replacing the value in a later commit does not remove it. The credential is preserved in git history and can be retrieved by anyone with repository access:

# These commands recover a deleted credential from git history
git log --all --full-history -- .env
git show <commit-hash>:.env

# Or from any commit that contained the file
git log --oneline
git show <old-commit>:src/database.js  # the credential is still here

For public repositories, this is catastrophic. Automated secret-scanning bots — programs that continuously crawl GitHub and similar platforms looking for exposed credentials — can capture a leaked secret within minutes of a push. A credential that was public for even a brief window has very likely already been captured, even if the repository is made private afterward.

The MCP Config Risk#

MCP (Model Context Protocol) is a protocol that lets AI agents connect to external tools and services such as GitHub, databases, or Slack. When developers configure Claude Code, Cursor, or similar tools to use MCP servers, the configuration file contains authentication tokens for those external services.

These config files are frequently created inside project directories. If they are not excluded from version control, they get committed alongside the project code — exposing tokens for every connected service.

// ❌ .mcp.json accidentally committed to git
{
  "mcpServers": {
    "github": {
      "command": "mcp-github",
      "env": {
        "GITHUB_TOKEN": "ghp_xxxxxxxxxxxxxxxxxxxx"  // live token
      }
    },
    "database": {
      "command": "mcp-postgres",
      "env": {
        "DATABASE_URL": "postgresql://admin:s3cr3t@db.prod.example.com/app"
      }
    }
  }
}

The Provider Logging Risk#

So far we have looked at one leakage path: .env values end up in generated code, which then gets committed to git. There is a second, less obvious path: the contents of the AI agent's entire context window are sent to the AI provider's servers as part of every API request — and those servers may retain that data for days.

Two Paths for Credential Leakage from the AI Context Window
Rendering diagram...

Every time an AI coding agent makes a request to the underlying model — to generate a function, explain an error, or suggest a refactor — it sends its entire context window to the provider's API. If that context includes your .env file contents (which the agent loaded automatically), those credentials travel with the request.

What AI Providers Retain by Default#

Most developers do not realize that AI provider APIs retain prompts by default for a period of days.

Default data retention for AI coding agent API calls (verify with current provider policy)

ProviderDefault RetentionUsed for Training?Zero-Retention Option
OpenAI API30 days (abuse monitoring)No (API tier)Enterprise ZDR agreement required
Anthropic API7 days (reduced from 30 days in Sept 2025)No (API tier)Enterprise ZDR agreement required
GitHub Copilot Business/EnterpriseShort-term, telemetry-controlledNoEnterprise tier controls
CursorSent to underlying provider; Privacy Mode availableVaries by modelPrivacy Mode (must be explicitly enabled)

Policies change frequently. Always verify against the provider's current Terms of Service and Privacy Policy before handling sensitive data.

"Zero Data Retention" (ZDR) means the provider processes the request entirely in memory and writes nothing to persistent storage — not even for safety monitoring. ZDR is available from both OpenAI and Anthropic, but it requires an enterprise contract and is not enabled by default for standard API accounts.

Under default settings, a secret that enters an AI prompt could remain in provider logs for up to 7–30 days. If the provider's infrastructure is breached during that window, or if the logs are accessed in an internal investigation, the secret is exposed through a path completely separate from your version control.

The Prompt as an Attack Surface#

This risk goes beyond automatic .env loading. Developers sometimes paste credentials directly into prompts:

# ❌ Never do this — the entire message is sent to and logged by the provider
"Here is my database connection string: postgresql://admin:s3cr3tP@ss@db.prod.example.com/app
Can you help me write a connection pool for this?"

A prompt is not a private channel. It is an API request payload that passes through the provider's infrastructure, may be logged, and is subject to the provider's data retention policy. Treat every prompt as if it could appear in a log file — because under default settings, it will.

Credentials Committed via AI Context Window

High
2.6 · Secret LeakageCWE-798

AI had .env loaded in context. It wrote the literal values from the .env file into the generated code instead of referencing the environment variable names.

Preventive Setup: The .gitignore Checklist#

Run through this checklist before your first AI-assisted session on any project:

# Add these to .gitignore if they are not already there
echo ".env" >> .gitignore
echo ".env.local" >> .gitignore
echo ".env.*.local" >> .gitignore
echo ".env.production" >> .gitignore

# MCP configuration files
echo "claude_desktop_config.json" >> .gitignore
echo ".mcp.json" >> .gitignore
echo "mcp-config.json" >> .gitignore

# Verify the file is excluded before continuing
git check-ignore -v .env
# Expected output: .gitignore:1:.env    .env
# If there is no output: the file is NOT excluded — fix .gitignore before proceeding

Configuring Agent Ignore Lists#

.gitignore prevents files from being committed to version control, but it does not prevent the AI agent from loading them into its context window. These are two separate protections, and you need both.

Most AI coding tools have their own mechanism for excluding files from the agent's context:

# Claude Code: add to .claude/settings.json
# Codex: add to .codex/config.toml
# Cursor: add to .cursorignore (same syntax as .gitignore)
# GitHub Copilot: configure via .copilotignore or workspace settings

# .cursorignore example
.env
.env.*
*.pem
*.key
claude_desktop_config.json
.mcp.json
secrets/
credentials/

The goal is to prevent the agent from ever seeing the credential values in the first place — so they cannot end up in generated code or in the context window sent to the provider.

Never grant AI agents direct access to credential stores. If your team uses a secrets manager (AWS Secrets Manager, HashiCorp Vault, 1Password Secrets Automation), the AI agent should not have credentials to access it. Agents should work with placeholder variable names, not with actual secret values. The correct pattern is: the agent generates code that references os.environ["DB_PASSWORD"], and the actual value is injected at runtime by your deployment infrastructure — never written by the AI.

Scanning Before Committing#

Even with .gitignore configured correctly, run a secret scanner on your staged changes before every push. This catches anything that may have slipped through:

# Using trufflehog (recommended)
trufflehog git file://. --since-commit HEAD --only-verified

# Using git-secrets
git secrets --scan

# Using gitleaks
gitleaks detect --source . -v

If you find a credential that has already been pushed, rotate it immediately — then clean up the history. A rotated credential that is still in history is far less dangerous than an active credential that remains there.

Sources: