When not to use AI
In one line: AI is the right tool when the input is open-ended and a wrong answer is recoverable. When either of those isn't true, prefer a rule, a query, or a human.
In 2026, "let's add AI to it" is the new "let's add a microservice." Most ideas don't survive a hard look. Before you reach for an LLM, ask: would a regex work? Would a query work? Would a better form work? Would a rules engine work? When the input is structured and the failure mode is dangerous, AI is the wrong tool — even though the demo will be impressive.
Don't use AI when
- A regex, query, or rules engine works. Faster, cheaper, deterministic, auditable, debuggable.
- The cost of a wrong answer is catastrophic and there's no human review step.
- The input is already structured. A SQL
SELECTis the right tool for structured data. - Latency must be sub-100ms at scale. LLM call round-trip kills the budget.
- Cost-per-decision must be sub-cent at scale and the volume is huge.
- The "AI" decision is really a UI / UX problem. Sometimes a better form is the actual answer.
- The task is high-stakes and the failure mode would be invisible. Bad answers that look good are worse than no answer.
- You can't explain what "right" looks like. Without a measurable target, you can't tell if the AI is working.
Things that look like AI problems but aren't
- "Recommend articles." Collaborative filtering still wins at scale for many cases. AI helps with cold start, not steady state.
- "Detect fraud." Gradient-boosted models on structured data still beat LLMs for tabular fraud detection — and they're 10,000x cheaper per call.
- "Search the help center." BM25 + good content is often better than RAG; add RAG only when BM25 misses specifically-semantic queries.
- "Classify the support ticket." A fine-tuned small classifier or even a rules cascade often beats an LLM call per ticket on cost and latency.
- "Spell-check / autocomplete." Real autocomplete is a trie or an n-gram model. LLM autocomplete is overkill for most fields.
- "Translate this string." For a fixed catalog of strings, a translation database wins. For real-time user content, AI wins.
- "Schedule a meeting." Most of "AI scheduling" is calendar arithmetic. The AI part is parsing the human's request, which is one prompt — not the whole product.
When AI clearly is the right tool
- Generation of free-form content (emails, code, marketing copy, summaries).
- Summarization / rewriting of unstructured text.
- Open-ended Q&A over unstructured corpora.
- Reasoning over messy inputs that humans currently spend cognition on.
- Tool orchestration where the right tool depends on the request.
- Classification of long unstructured inputs where labeling-then-training is too slow to keep up.
- Multimodal understanding (image, audio, video → text).
The pre-mortem
Before greenlighting an AI feature, ask: if this fails in the worst plausible way, what happens?
- If the answer is "a customer doesn't notice" — proceed.
- If the answer is "a customer gets a slightly worse experience" — proceed with monitoring.
- If the answer is "a user is harmed, a contract is lost, a regulator notices, or our brand takes a hit" — design for human oversight from day one, or pick a different solution.
When this rule doesn't apply
- You're using AI as one feature among many and the failure mode is contained to that feature. AI in a draft mode that the user accepts or rejects is low-risk.
- The bar is "better than what we have," not "perfect." A 90% accurate summary beats no summary, even if it has occasional errors.
- You're in an exploratory phase and the cost of getting it wrong is small. Prototype freely; productionize carefully.
Common mistakes
- "AI" as feature theater. Adding an AI feature to look modern, with no real user demand. These features ship, generate one demo gif, and then nobody uses them.
- Confusing "the demo is impressive" with "this is shippable." Demos run on cherry-picked inputs. Production runs on whatever users actually type.
- Skipping the "is this even an AI problem?" question. Engineering manager pressure to "add AI" leads to LLM calls bolted onto features that were fine without them.
- Pricing the failure mode as zero. "It's just a chat assistant, what's the worst that could happen?" The worst is a hallucinated medical or legal claim that lands you on the news.
How to apply it
For every proposed AI feature, run a 5-minute filter:
- Is the input structured? If yes, prefer a query or rules.
- Could a regex / template / form fix this? If yes, do that.
- What's the worst plausible failure? If catastrophic, require human-in-the-loop or kill the idea.
- What's the latency budget? If sub-100ms, AI usually doesn't fit.
- What's the eval that proves this is working? No eval = not ready.
If the feature survives the filter, it's worth building.
A SaaS team plans an "AI-powered search" over their docs. Six weeks in: RAG works, but it's slower than the existing keyword search, costs 100x more per query, and the answers are sometimes confidently wrong.
They go back to first principles. The data: ~80% of user queries are exact-match for a known feature name. ~15% are typos. ~5% are genuinely conceptual.
The fix: ship a better keyword search with typo tolerance and an "ask AI" button for the 5%. The button calls RAG only when the user opts in. Search latency drops 90%. Cost drops 95%. The AI feature still exists, but only where it actually helps.
The lesson: "we're adding AI to search" wasn't wrong, but the framing was. "We're adding AI to the 5% of search queries keyword can't handle" is the right framing — and it falls out of asking "when not to use AI" honestly.
→ Next: Eval investment.