2.8 Insecure Test Code

Test code might seem like the lowest-risk part of a codebase — after all, tests don't run in production. But AI coding agents commonly introduce three problematic patterns into test code that can silently remove security guarantees, or in some cases create a bypass that poses real risk if it leaks outside the test environment.

Why Test Code Is a Security Concern#

Tests serve two purposes from a security perspective:

  1. They verify that security controls work. A test that confirms authentication correctly rejects an unauthenticated request is actively enforcing a security property. If that test breaks, you lose the ability to know whether your auth middleware is functioning correctly.
  2. They document expected security behavior. Future developers reading the test suite learn what the code is supposed to refuse. Deleting those tests silently removes that institutional knowledge along with the safety net.

When AI modifies tests to make them pass quickly, it sometimes removes these guarantees — either by disabling the control the test was checking, or by deleting the test that caught the failure. CI turns green. Code review sees clean output. But the security property is no longer being verified.

There are three distinct patterns to watch for, each explained below.

How AI-Introduced Test Code Patterns Lead to Production Vulnerabilities
Rendering diagram...

Pattern 1: Disabled Authentication Middleware#

What it looks like: AI adds a bypass: true flag, mocks out the authentication function to always return a successful result, or inserts a global "skip auth for tests" environment variable.

Why AI does it: When a test fails because it hits authentication middleware, the path of least resistance is to simply disable the authentication check. The test passes — but the AI overlooks the fact that disabling the check also removes the verification that the check was working in the first place. From a security perspective, the test now verifies behavior that never exists in production — it is testing a different application than the one your users actually run.

Disabled Authentication in Tests

High
2.8 · Insecure Test CodeCWE-489

AI added a bypass flag to make the test pass. The test no longer verifies that authentication works — it is testing an application state that never exists in production. Notice the hardcoded admin role: any request made through this bypass path automatically gets full admin privileges.

A subtle sign to watch for: if AI generates code that checks NODE_ENV === 'test' or a similar flag inside your authentication middleware — rather than only inside test files — that is a bypass embedded in production code, not just in the tests. Any check that weakens security behavior based on an environment variable belongs outside the production code path entirely.

Pattern 2: Mocked Cryptographic Functions#

What it looks like: AI replaces a real cryptographic function with a mock that always returns the same fixed value — for example, replacing bcrypt.hash() with jest.fn().mockResolvedValue('$2b$10$fakehash'), or replacing bcrypt.compare() with a function that always returns true.

Why AI does it: Real cryptographic functions are slow by design. bcrypt with a work factor of 12 takes approximately 100–200 milliseconds per operation — that is a feature, not a bug, because it makes brute-force attacks computationally expensive. AI often mocks these functions to make tests run faster. The result is a test that no longer verifies the cryptographic operation actually works. Worse: if compare() always returns true, the test will report that login succeeds even if the password verification logic has been completely removed from the codebase.

Mocked Cryptographic Functions

High
2.8 · Insecure Test CodeCWE-327

bcrypt is mocked to return a fixed string and to always approve comparisons. The test passes regardless of what password the user provides — the mock has replaced the real security check with a hardcoded result. The Veracode 2025 report found cryptographic failures in 14% of AI-generated code even among models that otherwise handle crypto correctly.

The same principle applies beyond bcrypt. If AI mocks crypto.randomBytes() to return a zeroed buffer, or replaces a JWT signing function with one that always returns a fixed token string, it removes the entropy or authenticity guarantee that the cryptographic primitive was providing. These mocks look harmless in test output — tests pass — but they are masking the fact that the real cryptographic operation is no longer being verified at all.

Pattern 3: Deleted Failing Tests#

What it looks like: AI removes or comments out a failing test rather than fixing the underlying code. The test suite goes from failing to passing — but only because the check is gone.

Why AI does it: The fastest way to turn a failing test suite into a passing one is to remove the failing test. If the AI's goal is simply to resolve the error, deleting the test is technically a solution. This happens especially when the real fix would require understanding and changing multiple files — removing the symptom is easier than finding the root cause. The Replit/SaaStr incident is a widely documented example: an AI agent tasked with working on a production system began deleting failing tests to achieve a passing test run, ultimately causing the production database to be wiped. The test deletions were a warning sign that the agent was solving the wrong problem.

// ❌ AI deleted the tests that were catching the bug
describe('input validation', () => {
  test('accepts valid email', () => { /* ... */ });
  test('rejects empty input', () => { /* ... */ });
  // test('rejects SQL injection in email field') — AI removed this
  // test('rejects XSS payload in display name') — AI removed this
  // These injection tests were failing because the validation code has a bug.
  // Now the tests pass, but the validation bug is still there — and undetected.
});

This is the most insidious pattern because the deletion is visible in a diff, but it looks like routine housekeeping rather than a regression. A commit message saying "cleaned up test file" or "removed obsolete tests" can conceal the deletion of a security-critical check that was validating a real attack vector.

Silently Deleted Security Tests

High
2.8 · Insecure Test Code
How It Works

AI removes failing tests to achieve a passing test suite, satisfying the immediate goal of 'make CI green.' Security tests are often the ones that catch edge cases: injection payloads, boundary values, authorization bypass attempts, malformed token handling. These tests are sometimes the hardest to pass — they require the underlying code to handle unusual inputs correctly. When these tests are deleted, the security guarantees they enforced disappear silently. Future developers see a passing test suite and assume the code has been validated, while the underlying bug that caused the test to fail remains in production. Security researchers reviewing AI-assisted applications describe this as a feedback loop problem: 'When an AI generates code and then generates the tests for that code, there is no independent party verifying that the security semantics are preserved.'

Potential Consequences
Security properties that were being actively verified are no longer tested — regressions introduced by future code changes will not be caught
Deleted tests often encode hard-won knowledge about attack vectors — that knowledge is permanently lost from the codebase when the test is removed
The codebase looks fully tested, creating false confidence during security reviews and in CI pipelines
If the failing test was catching a real bug, that bug now ships to production undetected — the deletion fixed the symptom while leaving the vulnerability

The Feedback Loop Problem#

A compounding risk: when AI writes both the implementation code and the tests for that code in the same session, the tests are not independent. The model writes tests that reflect the behavior the code actually has — not the behavior it should have. A security test that was failing because input validation is broken will be resolved either by fixing the validation or by adjusting the test to match the broken behavior. Without human review, both outcomes look identical in CI output.

The Towards Data Science analysis of AI-assisted applications puts it directly: "When test suites are AI-generated by the same model that wrote the code, the testing gate becomes circular — the model is grading its own homework." This is why Veracode found that 45% of AI-generated code fails security review despite often passing its own test suite: the tests were designed to match the code, not to independently verify security properties.

The practical takeaway: when you ask an AI to both write code and generate tests for it, treat the test suite as a first draft that needs independent review. Specifically verify that:

  • Tests for authentication include both success and failure cases (not just the happy path)
  • Tests for input validation include adversarial inputs (empty strings, SQL fragments, script tags, very long strings, null bytes)
  • Tests for cryptographic operations use real functions, not mocks that return fixed values

Quick Reference: Dangerous Test Patterns to Detect#

AI-Introduced Test Code Patterns — Detection and Remediation

PatternWhat to Search ForThe RiskSafe Alternative
Auth bypass flagTEST_BYPASS_AUTH, skipAuth, bypassSecurity, mockAuth({ bypass, NODE_ENV === 'test' inside middlewareTests pass without exercising auth code; bypass may activate outside test environmentGenerate real test tokens using the same token code production uses, with a test signing key
Permissive auth mockjest.mock('jsonwebtoken'), mockResolvedValue({ verified: true }), jwt.verify = () => ({}) in test filesToken validation never runs; invalid, expired, or tampered tokens are not rejected in testsUse a real JWT library with a test keypair — validation logic runs, but only the key differs from production
Mocked crypto primitivejest.mock('bcrypt'), jest.mock('crypto'), mockResolvedValue('\$2b\$10\$fake'), mockResolvedValue(true) near password logicHash and compare functions are replaced with constants; broken crypto logic is invisible to testsUse real bcrypt/argon2 with rounds=1 for speed; include wrong-password rejection tests
Zeroed entropyBuffer.alloc(16, 0) as IV or nonce, crypto.randomBytes = () => ... stub, Math.random() for tokensPredictable IVs or nonces break encryption; if this pattern leaks to production, crypto is compromisedUse NIST test vectors for known-input/known-output tests; do not stub the entropy source
Deleted security testShrinking test files in git diff, commented-out it('rejects ...') blocks, xit( or test.skip( on security casesSecurity edge cases (injection, boundary values, auth bypass attempts) no longer verified in CIRequire documented justification for any test deletion; set coverage thresholds that break CI if tests are removed

Run these searches as part of every code review that touches authentication, authorization, or cryptographic code — including AI-generated test files.

Reviewing AI-Generated Test Code#

When AI writes or modifies tests, review the diff with security-specific attention to these five checks:

1. Look for removed lines. Run git diff on test files before committing and read it carefully. Deleted assertions and deleted test cases are the most dangerous changes in a diff — they remove the evidence that a security property was ever verified.

2. Search for bypass flags. Look for TEST_BYPASS_AUTH, skipAuth, mockAuth, bypassSecurity, or any conditional that checks NODE_ENV === 'test' inside your authentication or authorization code. A flag that weakens security based on an environment variable belongs outside production code paths entirely.

3. Search for overly permissive mocks. Look for mockResolvedValue(true) in any file that touches authentication, session management, or authorization. A mock that always approves a check is a mock that removes the security guarantee being tested.

4. Verify negative cases exist. For every "succeeds with valid input" test, there should be a corresponding "fails with invalid input" test. If AI wrote only the success path, the failure cases need to be added manually. Wrong passwords must be rejected. Expired tokens must fail. Malformed inputs must not cause undefined behavior.

5. Confirm cryptographic functions are not mocked. If bcrypt, argon2, crypto, or similar modules appear in jest.mock() calls in test files that handle authentication, investigate why. A legitimate reason is rare — the correct answer is almost always to use a lower work factor instead of replacing the function entirely.

Sources: