2.1 The Evidence: How Often AI Gets Security Wrong

Before diving into specific vulnerability patterns, let's look at how widespread the problem actually is. This section walks through independent research — real studies with real numbers — so you can see exactly how much extra scrutiny AI-generated code deserves.

The short answer: far more scrutiny than most developers apply today.

What the Research Shows#

Six independent research efforts — from university labs, security companies, and production telemetry — have examined the security quality of AI-generated code. They used different methods and tested different AI models, yet they all reached the same conclusion: AI-generated code carries significant security risk.

Independent research findings on AI-generated code security

Study	Scope	Key finding
Stanford / ACM CCS 2023	47 developers, 5 coding tasks, controlled experiment	Developers using AI wrote significantly less secure code on 4 of 5 tasks — and were more confident it was secure than developers who worked without AI.
NYU CCS / IEEE S&P 2022	89 scenarios, 1,689 Copilot-generated programs	~40% of generated programs contained at least one exploitable security vulnerability.
Veracode 2025	80 tasks, 100+ LLMs, 4 languages	45% of AI-generated code samples failed security review. Java was the worst at 72%. Cross-site scripting tasks failed 86% of the time. Security performance has not improved meaningfully across two years of model upgrades.
Armis Trusted Vibing Benchmark 2026	18 AI coding models, 31 test scenarios	100% failure rate — every model failed at least one security-critical scenario. Weakest areas: memory management, authentication, and file handling.
Apiiro (production telemetry, 2025)	Tens of thousands of repos, Fortune 50 enterprises	AI-assisted developers produced 3–4x more commits. Security findings increased 10x. Privilege escalation vulnerabilities jumped 322%.
Escape.tech (2025)	5,600+ publicly deployed vibe-coded apps	2,000+ vulnerabilities, 400+ exposed secrets, 175+ instances of exposed personal data (medical records, IBANs, phone numbers).
GitGuardian (2026)	Billions of public GitHub commits in 2025	AI-assisted commits had a 3.2% secret-leak rate versus a 1.5% baseline — roughly twice the rate of human-only commits.

Sources: arXiv 2211.03622 (Stanford), arXiv 2108.09293 (NYU/IEEE S&P), Veracode GenAI Code Security Report (2025), Armis Trusted Vibing Benchmark (2026), Apiiro blog (Sep 2025), Escape.tech (2025), GitGuardian State of Secrets Sprawl 2026.

These studies come from different research groups, different time periods, and different AI models — yet they all point to the same conclusion. The problem is not limited to one tool or one programming language. It is a fundamental characteristic of how AI models generate code.

The Confidence Gap: The Most Dangerous Finding#

Of all these studies, the Stanford experiment produced the most alarming result — not because it found the highest vulnerability rate, but because it uncovered the most dangerous combination: worse security outcomes paired with higher developer confidence.

Researchers gave 47 participants five coding tasks where security mattered: encrypting data, creating cryptographic signatures, preventing SQL injection (an attack where user input is crafted to manipulate a database query), and controlling file access. One group had access to an AI assistant; the other group worked without one.

The results on the message-signing task illustrate the pattern clearly:

Only 3% of AI-assisted developers produced a secure implementation
21% of developers working without AI produced a secure implementation
Yet the AI-assisted group rated the AI's trustworthiness at 4 out of 5
The developers who produced secure code rated the AI at just 1.5 out of 5

The developers who trusted the AI most were the ones who got the worst security outcomes.

The Confidence Gap: How AI Creates a False Sense of Security

Rendering diagram...

The mechanism is straightforward: AI-generated code is well-formatted, compiles cleanly, and handles typical use cases correctly. There are no obvious typos, no structural mistakes, no immediate red flags. So the developer concludes the code must be fine — and skips the security review that would have caught the real problem.

Security flaws almost never look broken on the surface. They hide in logic: using the wrong cryptographic algorithm (the mathematical method used to encrypt or hash data), missing an authentication check, or allowing user input to flow directly into a database query without validation. None of these cause syntax errors or make basic tests fail — unless you write tests specifically designed to check for them.

The Stanford researchers identified three mechanisms behind this confidence gap:

Wrong library or algorithm choice. The AI suggests something that works but is insecure by modern standards. For example, an encryption function runs, its output looks like encrypted data, and the developer moves on — unaware that the algorithm itself is the problem, not the code syntax.

Edge case blindness. The AI handles the main use case correctly but silently ignores critical edge cases — a file path that escapes outside the intended directory, an input that breaks out of a SQL query, or a JWT (JSON Web Token, a common format for authentication) with missing signature verification. A quick functional test passes, so the dangerous case goes unnoticed until much later.

Testing displacement. Instead of focusing on writing secure code, developers shift their attention to simply verifying that the AI's output runs. That is a fundamentally different — and far less rigorous — mindset. One participant in the study captured it perfectly: "I don't remember if the key has to be prime or something, but we'll find out... I will test this later but I'll trust my AI for now."

The developers who produced the most secure code while using AI were the ones who:

Actively distrusted the output
Rephrased prompts to probe edge cases
Verified results independently

Skepticism was the deciding factor — not the AI tool itself, not the programming language, not the framework. Trusting the AI less led to better security outcomes.

Why AI Generates Insecure Code: The Training Data Root Cause#

AI models do not intentionally write insecure code. The problem is more subtle: they were trained on billions of lines of publicly available code, and a large portion of that code was written before modern security practices existed — or by developers who prioritized getting things working over making them secure.

When an AI generates code, it predicts the most statistically likely completion based on patterns in its training data. Modern agentic tools can explore your project and reason about context — but unless security requirements are made explicit in the prompt or project instructions, they still default to completing the functional pattern. In other words, the AI reproduces what it has seen most often — not what is most secure for your specific situation — unless you tell it security matters.

Why Training Data Bias Produces Insecure Output

Rendering diagram...

Here is how this plays out in practice:

String-concatenated SQL queries were the dominant pattern in tutorials from the 1990s and 2000s. Parameterized queries — the safer approach that separates user input from the query structure — exist in the training data, but string concatenation appears far more frequently.
MD5 and SHA-1 appear far more often in historical code than bcrypt or Argon2, even though MD5 and SHA-1 have been considered unsafe for password hashing for well over a decade. Bcrypt and Argon2 are modern alternatives specifically designed to protect user passwords.
Wildcard CORS headers (Access-Control-Allow-Origin: *) — which tell the browser to allow any website to access your API — are common in introductory tutorials because they eliminate setup friction. A properly restricted CORS configuration requires more code and appears far less often in beginner-oriented examples.
API keys hardcoded into source files dominate early tutorials and quick-start guides because loading secrets from environment variables (configuration values stored outside your code, so they never get accidentally committed to version control) requires extra setup that beginner examples typically skip.

The model defaults to completing the most common pattern it has seen. It can reason about security when prompted — but it won't proactively audit for attack vectors unless your prompt or project configuration makes security an explicit requirement.

This is why the Veracode benchmark found that security performance has remained largely flat across two years of model upgrades. Functional code quality has improved dramatically — models generate more correct, more idiomatic code than ever — but security has not kept pace, because the underlying training data bias has not changed.

Vibe Coding Amplifies Every Risk#

Vibe coding is a development style where software is built almost entirely through AI prompts, with minimal manual review or hands-on coding. Rather than writing code themselves, the developer describes what they want in natural language and accepts what the AI produces, refining through prompts until the application appears to work correctly.

Vibe coding makes the confidence gap worse for a specific reason: when you are not writing the code yourself, you have far less visibility into what the generated code actually does. Security flaws in the underlying logic go unnoticed because you never read the code that contains them.

The Escape.tech scan of over 5,600 publicly deployed vibe-coded applications found:

More than 2,000 security vulnerabilities
Over 400 exposed secrets
175+ instances of exposed personal data — including medical records, bank account numbers (IBANs), and phone numbers

The Apiiro production data reinforces this at enterprise scale: across Fortune 50 companies, AI-assisted development produced 3–4x more commits — but security findings increased by 10x, with privilege escalation vulnerabilities jumping 322%. Each AI-assisted pull request was also larger and touched more files, meaning each merge carried a larger blast radius if something went wrong.

The pattern is consistent: AI accelerates how fast code is written while multiplying security risk at the same time. More code ships per hour, more of it contains vulnerabilities, and less of it gets manually reviewed — because developers assume the AI has already handled the obvious problems.

This guide does not tell you to stop using AI assistance. It tells you that the security review step cannot be handed off to the same AI that wrote the code.

The Confidence Gap

High

2.1 · The Evidence

How It Works

AI-generated code looks polished and passes basic functional tests, creating a false signal of overall quality. Security flaws live in logic — wrong algorithm choice, missing authentication check, unsanitized user input flowing into a database — not in syntax. A linter will not catch them. A test that only exercises the expected path will miss them. The only reliable way to catch security flaws is through deliberate security review, which developers skip more often when AI has given them a false sense of confidence.

Potential Consequences

Developers skip security review because the code looks professional — the review you skip without noticing is the most dangerous one

AI-assisted developers produce 3–4x more code per session, so each skipped review compounds across a much larger attack surface

The Armis Trusted Vibing Benchmark (2026) found a 100% failure rate across 18 models — no model can be trusted to catch its own security bugs

Veracode (2025) found no meaningful security improvement across two years of model upgrades, even as syntactic accuracy improved dramatically

What to Do With This Information#

The research does not say AI coding tools are useless. It says they shift where the security risk lives.

When you write code by hand, you are actively engaged with the logic. You think through edge cases as you type. You notice when something feels off. When AI writes the code, that active engagement disappears — and you need to bring it back deliberately during the review step.

Every section that follows in this chapter describes a specific vulnerability pattern to look for during that review — from missing input validation and hardcoded credentials to unprotected API routes and insecure test code. Understanding why AI tends to produce each pattern is what will help you catch it before it ships.

Sources:

PreviousChapter 1: The AI Security Paradox

Next2.2 Common Vulnerability Patterns