3.5 Denial of Wallet — When AI Becomes Expensive on Purpose

Traditional Denial of Service (DoS) attacks try to bring a system down. Denial of Wallet takes the opposite approach: the system stays up, processes every request, and looks completely normal — while quietly accumulating API charges that can reach tens of thousands of dollars in a single day.

This is the billing dimension of OWASP LLM10:2025 — Unbounded Consumption. OWASP originally tracked this threat as "Model Denial of Service," but renamed and expanded it in the 2025 edition to explicitly cover cost-based attacks. The risk is fundamentally different from a traditional DoS: the attacker's goal is not to disrupt your service — it is to cause financial damage. The service continues working exactly as designed. The problem is that nothing in your application limits how much work it will do, or how much that work will cost you.

This makes Denial of Wallet especially dangerous for developers shipping their first LLM-powered feature. A chatbot with no rate limiting, no spending cap, and an openly accessible endpoint is not just a security risk — it is a financial liability that can generate a catastrophic bill before anyone notices anything is wrong.

DoS vs. Denial of Wallet#

Denial of Service vs. Denial of Wallet — key differences

	Denial of Service (DoS)	Denial of Wallet (DoW)
Goal	Make the service unavailable	Drain the billing account
What users see	Service is down or unreachable	Service works normally
What you see	Uptime alerts, error rate spikes	Nothing — until the invoice arrives
How it's detected	Monitoring and uptime dashboards	Billing alerts and spending dashboards
Attack tool	High request volume to saturate bandwidth	Crafted inputs, stolen API keys, missing limits
Traditional defense	CDN, traffic scrubbing, DDoS protection	Rate limiting, token caps, spending alerts
When you notice	Immediately — the service is down	Days or weeks later when the bill arrives

The last row is the most important difference. A successful DoS attack is immediately obvious — users can't reach your service. A successful Denial of Wallet attack may go completely unnoticed until your monthly invoice appears, unless you have spending alerts configured in advance.

How AI API Billing Creates the Risk#

AI providers charge per token — a rough unit of text that is approximately four characters, or three-quarters of an average English word. Both input tokens (what you send to the model) and output tokens (what the model generates in response) are billed, with output typically costing more. As a concrete example, GPT-4o charged approximately $2.50 per million input tokens and $10 per million output tokens in early 2026.

This pricing model has a critical asymmetry that traditional web APIs do not: the cost of a single request can vary enormously depending on how much text is sent in and how much the model generates in reply. A simple one-line question costs almost nothing. A request that floods the model's context window with a 50,000-word document and asks for a detailed step-by-step analysis costs orders of magnitude more. An attacker who understands this asymmetry can craft inputs specifically designed to maximize cost per request — without triggering any of your usual alarms.

Two Ways an Attacker Drains Your Budget#

Denial of Wallet — Two Attack Paths

Rendering diagram...

Attack Path 1: Request Abuse#

When your AI endpoint has no rate limiting, no input length validation, and no output token cap, anyone who can reach it can generate arbitrary amounts of billed compute on your account. Three patterns drive most abuse:

Token flooding. The attacker sends requests designed to fill as much of the model's context window as possible — pasting a massive text corpus, requesting a detailed analysis of a very long document, or padding inputs with repetitive content. Each of these consumes far more tokens per request than a legitimate user would. A sustained campaign can exhaust a monthly budget in hours.

Prompt injection triggering loops. As covered in §3.1, prompt injection attacks embed hidden instructions inside user input or retrieved content. A Denial of Wallet variant specifically targets the model's behavior: injecting instructions that cause the model to generate extremely long responses, repeat content, or enter tool-calling loops where the model repeatedly invokes external tools, each call generating another billed API request. Because the model is following the injected instructions, the resulting token usage can far exceed what a normal request produces.

Mass automated requests. Without rate limiting, a single attacker running an automation script can send hundreds or thousands of API calls per minute. Even if each individual request is modest in token usage, the sheer volume alone can produce ruinous costs. This is especially damaging for expensive operations such as large-context completions or multi-step agentic tool-calling chains.

Unbounded Token Consumption

High

An AI endpoint accepts unlimited input length and places no cap on output tokens, allowing a single crafted request to consume thousands of times more compute than intended.

Ch 3 · LLM-Specific ThreatsOWASP LLM10:2025

The endpoint passes user input to the AI API with no length check and no max_tokens limit. An attacker submitting a 100,000-word input and requesting a detailed response will trigger an extremely expensive API call with no safeguard.

Attack Path 2: LLMjacking#

LLMjacking refers to attackers using stolen API keys to call AI provider APIs at your expense. The term was coined by Sysdig's Threat Research Team following an investigation into active attacks targeting exposed cloud credentials in 2024.

The attack works because AI API keys are bearer tokens — meaning anyone who possesses the key can make API calls charged to the key owner's account. There is no second factor of authentication at the API call layer: if you have the key, you can use the service and generate charges. The account holder is billed for every call made with their key, regardless of who actually made it.

How keys are stolen. An API key posted in a public GitHub repository, committed inside a Docker image, returned by a misconfigured application endpoint, captured in cloud service logs with overly broad output settings, or embedded in a built JavaScript bundle can all be extracted by automated scanners. These scanners run continuously, monitoring public repositories, package registries, and container images for patterns that match AI provider key formats. Research on credential exposure shows that stolen keys are typically exploited within minutes of being published.

What attackers do with stolen keys. Some use stolen keys for their own free AI access — running expensive models at your cost. Others deploy reverse proxies and resell the access to third parties, charging a fraction of the real price while the bill lands on the compromised account. In either case, the victim typically discovers the attack only when the invoice arrives.

The cost is not theoretical. Sysdig's LLMjacking research documented a compromised Anthropic Claude 2.x key being abused across multiple regions to generate $46,000 in charges per day. In a separate incident in early 2026, a single stolen Google Gemini API key accumulated $82,314 in charges within 48 hours. A stolen key may sell on underground markets for as little as $30 — while generating damage thousands of times that amount for the victim.

The connection to Chapter 2 is direct. Section 2.6 covered how AI coding agents can leak secrets from context windows into generated code and commit history. Every credential leaked that way is a potential LLMjacking target.

Exposed AI API Key

Critical

An API key is stored insecurely — hardcoded in source code, committed to a repository, or logged by the application — allowing any attacker who finds it to use your AI quota at your expense.

Ch 3 · LLM-Specific ThreatsOWASP LLM10:2025

The API key is hardcoded directly in the source file. When this file is committed to a repository — public or private — the key becomes part of git history permanently. Deleting the file in a later commit does not remove it from existing clones or from git history.

Why Standard Rate Limiting Is Not Enough for AI Endpoints#

Rate limiting is the primary defense against request abuse, but standard rate limiting strategies were designed for APIs where requests have roughly uniform cost. AI APIs break this assumption entirely.

A rate limiter that allows 60 requests per minute provides no protection against a single request that consumes 100,000 tokens. A burst limit prevents rapid-fire attacks but does nothing about a slow, sustained campaign that stays just under the threshold while sending maximum-length inputs each time. To protect an AI endpoint effectively, rate limiting must account for what you are actually being billed for — tokens consumed, not just request count.

Rate limiting strategies for AI endpoints

Strategy	What it limits	Protects against	Blind spot
Request rate limiting	Requests per unit time	Mass low-cost request floods	A single high-token request slips through
Input length validation	Characters or tokens per request	Token flooding via oversized inputs	Does not limit output length or overall request volume
`max_tokens` cap	Output tokens per API call	Runaway response generation	Does not constrain input size or request frequency
Per-user token budget	Cumulative tokens consumed per day	Sustained high-token abuse by one user	Distributed attacks across many accounts bypass per-user limits
Spending alerts	Dollar amount of charges	Both attack paths — catches anomalies	Reactive: alert fires after charges have already occurred
Hard spending cap	Dollar amount of charges	Both attack paths — stops runaway costs	May interrupt legitimate users if set too low

A complete defense combines multiple strategies. No single control covers every Denial of Wallet scenario.

The combination that covers the most ground: validate input length, set max_tokens on every API call, enforce a per-user daily token budget, and configure a hard provider-level spending cap. Each layer catches what the previous one misses.

Configuring Provider-Side Spending Controls#

Provider-side spending controls are your last line of defense against LLMjacking — they cap the damage even if an attacker bypasses every application-level control, because they act directly on the API key at the provider's infrastructure level.

OpenAI: Set monthly spend limits in the API dashboard under Billing → Limits. Configure a notification threshold and a separate hard limit that blocks all API calls once it is reached.

Anthropic: Configure usage limits in the Console under Settings → Limits. Set both an alert threshold and a hard cap.

Google Cloud (Vertex AI / Gemini API): Use GCP Budget Alerts under Billing → Budgets & Alerts for notifications. Apply API quotas and rate limits in the API section to cap requests per minute and per day.

The rule that matters most: configure spending controls before you deploy your AI feature — not after you receive the first unexpected bill. A billing alert configured the day after an LLMjacking incident has already failed at its only job.

Practical Defenses at a Glance#

Denial of Wallet defenses by attack vector

Attack vector	Primary defense	Secondary defense
Token flooding (oversized inputs)	Validate input length; reject inputs above your character or token limit	Set `max_tokens` on every API call to cap output
Prompt injection triggering loops	Limit tool-calling iterations; cap total tokens per conversation session	Apply input scanning for injection patterns (see §3.1)
Mass automated requests	Per-user and per-IP rate limiting at the API gateway level	Bot detection or CAPTCHA for public-facing AI features
Leaked API key (LLMjacking)	Hard spending cap on your provider dashboard; billing alert at 50% of expected spend	Rotate keys regularly; scan git history before every push
No spending visibility	Configure provider billing alerts before deployment	Track per-user token cost in application-level metrics

Summary: What to Do#

The Chapter 5 checklist items that apply here: Is the AI feature rate-limited per user? Are spending alerts configured on the AI provider account?

If either of these is not true for your application, it is the highest-priority item to fix before addressing anything else in this chapter. Billing incidents from Denial of Wallet attacks have caused teams to receive invoices that exceed their entire monthly revenue — from a single overnight attack on an unprotected endpoint.

The pattern to internalize: every AI endpoint is a potential cost center. The question is not whether it will be abused, but whether you will notice before the abuse becomes financially catastrophic. Rate limiting, token caps, and spending alerts are not advanced security measures — they are the minimum viable deployment configuration for any public-facing AI feature.

Section 3.6 moves from financial attacks to a different category of risk in AI systems that retrieve external data: how attackers can corrupt the knowledge base your AI reads from, and what your application can do to prevent it.

Sources:

Previous3.4 Sensitive Information Disclosure

Next3.6 Vector and Embedding Weaknesses

3.5 Denial of Wallet — When AI Becomes Expensive on Purpose

Unbounded Token Consumption

Exposed AI API Key

Security Advisor