What would hurt — the pre-mortem

In one line: Before you ship an AI feature, write down the worst plausible thing it could do, decide if you can live with that outcome, and design the guardrails before launch — not after.

In plain English

Most AI incidents could have been predicted. "What if the model hallucinates a price?" "What if a user uploads a prompt injection?" "What if a tool call deletes the wrong record?" Asking these questions before launch costs a meeting. Discovering them after launch costs a post-mortem, a customer apology tour, and sometimes a contract. Do the pre-mortem.

The exercise

Sit the team in a room (or doc). For the feature you're about to ship, answer:

What's the worst plausible thing this could do to a user?
What's the worst plausible thing this could do to the business? (PR, lost customer, regulator notices.)
What's the worst plausible thing this could cost? (Runaway cost, infinite loop.)
What's the worst plausible thing an adversary could make it do? (Prompt injection, jailbreak, data exfiltration.)
What's the worst silent failure? (Wrong answer that looks confidently right and goes unnoticed.)

For each one, decide: can we live with this happening? If yes, log and ship. If no, what guardrail prevents it?

Categories to walk through

User harm

Wrong medical / legal / financial advice.
Offensive or biased output to vulnerable users.
Encouraging unsafe actions.
Misrepresenting capabilities ("I'll call you back" when there's no follow-up).

Business harm

Hallucinated pricing, terms, or commitments.
Output that looks like an official company statement and isn't.
Confidential info leaked to wrong customer.
Reputational damage from a screenshot going viral.

Operational harm

Infinite agent loops burning cost.
Single user driving 1000x the average load.
Tool calls with side effects (delete, refund, send email) firing wrongly.
Cascading failures (one bad agent step triggers many).

Adversarial harm

Prompt injection from user input or from retrieved content.
Jailbreaks producing harmful or off-policy output.
Adversary uses the AI to enumerate internal data.
Adversary uses an agent's tools to take harmful actions.

Silent quality decay

A model upgrade subtly worsens a key workflow.
A retrieval index goes stale and answers degrade.
A new prompt edit fixes one case and breaks ten others.
Cost-per-call doubles without anyone noticing.

Guardrails to consider

Once you've named the failures, the guardrail set is fairly stable:

Kill switch. A config flag that disables the feature or routes to a deterministic fallback.
Cost cap. Per-request and per-user spend ceiling.
Step cap. Agents can't loop more than N times.
Human in the loop for high-stakes actions (refunds, deletions, sending to external recipients).
Output validators that catch obvious failure modes before user sees them.
Eval gate in CI that catches regressions on the categories you care about.
Production monitoring for the silent decay categories.
Logging + sampling so you can review what the system actually did.
Prompt injection defenses for any feature that incorporates untrusted content.

When this rule doesn't apply

Genuinely throwaway internal tools where the worst case is "engineer notices it's wrong." Pre-mortem is overkill.
The feature is gated to a small alpha cohort where you can fix-as-you-go. Even then, write the worst-case before scaling.
Time-critical incident response where the choice is "ship now with one guardrail or lose the customer." Ship with the kill switch, do the full pre-mortem in the followup.

Common mistakes

Doing the pre-mortem too late. "Pre-mortem at the launch readiness review" is too late. Do it during design.
Hand-waving the adversary case. "Nobody would think to do that" is the worst sentence in security engineering. Someone will.
Pricing the worst case at zero because it's unlikely. Low-probability × catastrophic-impact is still an unacceptable expected value for some failure modes.
Skipping the silent-failure category. Most AI incidents are silent — wrong answers that look right. These are the hardest to design against and the most likely to bite.
Treating the pre-mortem as a compliance ritual. Filling out a template without actually thinking through the failures defeats the purpose. The point is to think, not to document.

How to apply it

For every AI feature about to ship:

Schedule a 60-minute pre-mortem with engineers + PM + a security/legal stakeholder.
Walk through the five categories. Write the worst plausible failure in each.
For each, decide: accept, mitigate, or kill.
Mitigations become launch blockers, not "we'll add it next sprint."
Document the accepted risks so future-you can re-evaluate.

The 60 minutes prevents the 60-day fire drill.

Worked example: the launch that didn't crash

A team is about to launch an AI feature that drafts customer-facing emails for sales reps. The pre-mortem surfaces:

User harm: low — rep reviews before sending.
Business harm: the AI could draft something inappropriate that the rep sends without reading. Mitigation: a "this is AI-drafted, please review" banner in the UI, plus a hold for emails containing certain keywords ("guarantee," "promise," dollar amounts above a threshold).
Operational harm: rep clicks "generate" repeatedly, runs up cost. Mitigation: cap at 10 generations per email, log usage.
Adversarial: customer email content could contain prompt injection. Mitigation: strip user-controlled content from the system prompt, use it as data only.
Silent decay: model upgrade changes tone. Mitigation: weekly eval of tone-of-voice on 50 sample drafts.

The mitigations take 1 sprint to add. The feature launches with no incidents. Three months in, a customer's email does contain a prompt injection that would have produced a problematic draft — the guardrail catches it.

The cost of the pre-mortem: one meeting. The cost of the incident if launched without it: a chunk of a quarter, a customer apology, a possible board update. Always worth it.

🤔 Quick checkQuick check

→ Next: When to buy an agent platform.

The exercise​

Categories to walk through​

User harm​

Business harm​

Operational harm​

Adversarial harm​

Silent quality decay​

Guardrails to consider​

When this rule doesn't apply​

Common mistakes​

How to apply it​