AI security — the OWASP LLM Top 10

In one line: The OWASP LLM Top 10 is the industry-standard list of how LLM apps get attacked. Learn all ten — and the rule that ties them together: the model is never your security boundary.

In plain English

OWASP is the group behind the famous web-app security "Top 10." They now maintain a Top 10 for LLM Applications — the shared checklist that security teams and employers expect you to know. The mistakes are surprisingly mundane: trusting the model's output, giving an agent too much power, putting secrets where the model can blurt them out. Here's all ten in plain words, each with the structural fix — a real control in your code, not "please don't hack me."

The cardinal rule

Every item below is the same idea wearing a different hat: the LLM is untrusted. Authorization, validation, and limits run in deterministic code around the model — never inside the prompt. If your only defense is telling the model "ignore malicious instructions," you don't have a defense.

The ten risks (2025 edition)

LLM01 — Prompt injection

Attacker text overrides your instructions — either directly ("ignore your rules and...") or indirectly, hidden inside a web page, PDF, or email your app reads. It's the #1 risk for the second edition running. Defense: treat every model output as untrusted; give tools least privilege; keep system instructions separate from user and retrieved content; require human approval for high-impact actions; filter inputs and outputs.

LLM02 — Sensitive information disclosure

The model reveals PII, secrets, or another user's data. Defense: scrub PII before it ever reaches the model or the logs; never put secrets or API keys in prompts; filter outputs.

LLM03 — Supply chain

A poisoned base model, dataset, adapter, or dependency. Defense: vet provenance, pin versions, scan dependencies, prefer reputable sources.

LLM04 — Data and model poisoning

Tainted training, fine-tune, or RAG data quietly changes behavior or plants a backdoor. Defense: source-control and validate your data; isolate and review untrusted documents before they enter the knowledge base.

LLM05 — Improper output handling

Downstream code trusts model output — which then produces SQL, HTML, or shell commands, leading to injection, XSS, or remote code execution. Defense: treat model output exactly like untrusted user input — encode it, validate it, use parameterized queries, and sandbox any generated code.

LLM06 — Excessive agency

The agent can simply do too much — too many tools, too-broad permissions, too much autonomy. Defense: give the minimum tools; scope each permission tightly; keep a human in the loop for irreversible actions; gate tool calls with deterministic checks.

LLM07 — System prompt leakage

Secrets or business logic hidden in the system prompt get extracted by users. Defense: assume the system prompt is public. Never put credentials, keys, or access rules in it — enforce those in code.

LLM08 — Vector and embedding weaknesses

RAG-specific: a poisoned knowledge base, or one user retrieving another tenant's documents. Defense: apply authorization at retrieval time (filter by who is asking, not just at generation); isolate tenants; validate ingested content.

LLM09 — Misinformation

Confident hallucinations, and humans over-trusting them. Defense: ground answers with retrieval and citations; surface uncertainty; require human review for high-stakes outputs; add faithfulness evals (see eval-driven development).

LLM10 — Unbounded consumption

"Denial of wallet" — expensive or runaway calls drain your budget or take the service down. Defense: rate limits, per-user and per-session token/cost budgets, request timeouts, and hard caps on agent loop iterations.

Guardrails — defense outside the model

A guardrail is a check that runs in regular code, wrapped around the model:

Input guardrails — detect prompt-injection patterns, scrub PII, block disallowed topics.
Output guardrails — validate format, strip PII, run a safety classifier before anything is shown or executed.
Tool-call guardrails — verify arguments and permissions before an action runs.

Open tools: NeMo Guardrails, Guardrails AI, Llama Guard.

Red-teaming — attack yourself first

Before shipping, deliberately try to break your own app: fire injection payloads, jailbreak attempts, content-filter bypasses, and access-control probes. For anything high-risk, run it in CI. Tools: promptfoo, deepteam, garak.

What people get wrong

Trying to "prompt away" injection instead of constraining what the model can actually do.
Handing an agent broad tool access "to be helpful."
Trusting model output in a downstream system (SQL / HTML / shell).
Putting authorization rules in the system prompt.
No cost ceiling — one runaway loop empties the account.

Further reading: OWASP GenAI Security Project — LLM Top 10. See also Safety & privacy and enterprise governance. For the full discipline behind these controls — threat modeling, guardrails, and red-teaming — go deep in Chapter 6: Responsible & Safe AI.

🤔 Quick checkQuick check

The cardinal rule​

The ten risks (2025 edition)​

LLM01 — Prompt injection​

LLM02 — Sensitive information disclosure​

LLM03 — Supply chain​

LLM04 — Data and model poisoning​

LLM05 — Improper output handling​

LLM06 — Excessive agency​

LLM07 — System prompt leakage​

LLM08 — Vector and embedding weaknesses​

LLM09 — Misinformation​

LLM10 — Unbounded consumption​

Guardrails — defense outside the model​

Red-teaming — attack yourself first​