Skip to main content

Framework Pick — When to Use Which (or None)

Dated content — June 2026

This page names specific tools, models, and prices, which rotate quarterly. The selection logic is durable; the names are a snapshot. Cross-check the Model snapshot for current model names and pricing.

In one line: Raw SDK is the right answer until it isn't — then pick the framework that matches the one abstraction you actually need (provider-swap, agent loops, RAG pipelines, evaluation).

In plain English

A framework is a pre-built kit of code that promises to save you from writing common plumbing yourself — and the AI world has a crowded shelf of them. The catch is that every kit hides details, and when something breaks inside one you don't understand, you're stuck. This page decides when to use a kit at all, and if so, which one fits the single problem you actually have. The rule it teaches: build the thing by hand once first, so you know exactly what the kit is doing for you — and keep the kit at arm's length in your code, because these kits change fast and you'll likely swap them someday.

The cardinal rule

Don't adopt a framework until you've built the equivalent yourself once. Frameworks paper over real complexity that you need to understand. Adopting one in Stage 1 means you have no model for what's happening when (not if) it leaks.

By the time you've completed Part I, you've built:

  • A raw LLM call (Stage 1).
  • A streaming chat with state (Stage 2).
  • Structured output (Stage 3).
  • A tool-calling loop (Stage 4).
  • A RAG pipeline (Stage 5).
  • An eval runner (Stage 6).
  • An observability layer (Stage 7).
  • An agent (Stage 8).

NOW you have something to compare frameworks against.

What each framework actually buys you

Provider-abstraction layer (raw call → many providers)

The lowest-friction win: write code once, call any model.

FrameworkLanguageNotes
Vercel AI SDKTypeScriptThe cleanest abstraction; streamText, generateObject, tool(); bind to any provider
LiteLLMPythonMost extensive provider list; proxy-server mode for ops
Pydantic AIPythonType-safe agents on top of any provider
Anthropic SDK / OpenAI SDKBothRaw; one provider only

If you're routing between models or want vendor flexibility, this is the abstraction worth adopting.

Agent loop frameworks

If you found yourself reimplementing the Stage 8 loop with retries, caps, tracing, and tool registries — these wrap that for you.

FrameworkLanguageNotes
OpenAI Agents SDKPython/TSOfficial, opinionated, handles handoffs
LangGraphPythonAgents as state graphs; great for branching workflows
Pydantic AIPythonTyped agents; clean ergonomics if you like the Pydantic style
MastraTypeScriptTS-native; opinionated about agent shape
Vercel AI SDK (maxSteps in generateText)TypeScriptLightweight, for chatbot-shaped agents

The agent-framework question: do you want graph-of-states (LangGraph), tool-based handoffs (Agents SDK), or just-a-loop-with-tools (Vercel AI SDK)?

RAG framework

If you found yourself reimplementing chunking + indexing + retrieval + reranking:

FrameworkLanguageNotes
LlamaIndexPythonMost mature; covers everything from parsing to evaluation
LangChainPython/TSBroader scope; RAG is one of many use cases
HaystackPythonStrong production focus; cleaner than LangChain

LlamaIndex is generally the right pick if RAG is your main thing. LangChain is the right pick if you want one framework for RAG + agents + tools + everything.

Eval framework

→ Covered in Eval tool pick. Short answer: Braintrust if hosted, Promptfoo or DeepEval if open-source.

Full-stack opinionated bundles

FrameworkLanguageWhat you get
DSPyPythonA whole paradigm — programmatic prompt optimization; not a "framework" so much as a system
Inferable / Mastra / GenkitTSOpinionated end-to-end LLM app frameworks

These are higher-risk picks; the abstraction is heavier, the community is smaller, but if the paradigm clicks for your team, productivity gains can be large.

The matrix

Your needPick
One provider, simple chatRaw SDK
Multi-provider routing in TypeScriptVercel AI SDK
Multi-provider routing in PythonLiteLLM or Pydantic AI
Complex agent with branching stateLangGraph
Simple agent loop with toolsOpenAI Agents SDK or Vercel AI SDK
Heavy RAG pipeline (PDF parsing, chunking, reranking)LlamaIndex
Production focus, no opinionHaystack
Type-safe Python agentsPydantic AI
Want to experiment with prompt-as-programDSPy
You're shipping in TypeScript and want one frameworkVercel AI SDK or Mastra

What LangChain is good at (and what it isn't)

LangChain remains the most-discussed AI framework. Some honest evaluation:

Good at:

  • Coverage — has integrations for everything.
  • Tutorials — most online content uses LangChain.
  • Quick prototypes — you can wire up a RAG in 50 lines.

Less good at:

  • API stability — frequent breaking changes; documentation gets out of date.
  • Abstraction tax — the wrappers can hide important details (token counts, latency).
  • Debugging — Chain/Runnable traces can be hard to follow without LangSmith.
  • "Heavy" feel for simple use cases.

If you want LangChain's coverage and prefer a cleaner core, LangGraph (same org, agent-focused) and Haystack (different org, production focus) are alternatives.

When NO framework is the right answer

If your AI feature is:

  • One LLM call per user action.
  • Simple structured output extraction.
  • A chatbot with no tools, no RAG.
  • A specific narrow task you've built once.

Raw SDK + ~200 lines of glue is cleaner than any framework. The framework only earns its keep when you have repeated patterns across features.

How frameworks evolve (a warning)

The 2024–2025 LangChain ecosystem looks very different from the 2022 one. AI frameworks have a higher churn rate than most software — APIs change, defaults change, recommended patterns change.

Build a framework-isolation layer in your code:

# bad: framework calls scattered everywhere
from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=ChatOpenAI(...), retriever=vectorstore.as_retriever())

# better: thin internal interface, framework hidden behind it
class RAGService:
def __init__(self): self._chain = build_chain() # hidden
def answer(self, q: str) -> str: ...

When the framework breaks (a major version, a deprecation, a switch to a different framework), only RAGService changes — not every caller.

Common mistakes

Where people commonly trip up
  • Adopting a framework before understanding the raw call. You can't debug what you don't understand. Build the raw version once; THEN evaluate frameworks.
  • Using two frameworks in the same project. "LangChain for RAG, OpenAI Agents SDK for the agent." Now you have two abstraction layers, two debugging patterns, two upgrade cycles. Pick one ecosystem.
  • Locking in early. "This framework had the best tutorial six months ago." Frameworks shift. Build with abstraction layers so you can swap.
  • Refusing all frameworks out of purism. Raw code is great until you've reimplemented the loop, the eval runner, and the observability layer for the third project. At that point, the framework's a productivity multiplier.
  • Picking based on GitHub stars. Star counts reflect 2-year-old momentum. Evaluate by API stability, current docs quality, recent commit activity, and whether the abstractions match YOUR shape of problem.
🤔 Quick checkQuick check

→ Next: Eval tool pick — Braintrust, Promptfoo, DeepEval, and friends.