Skip to main content

Agent frameworks

Dated content — June 2026

This page names specific tools, models, and prices, which rotate quarterly. The selection logic is durable; the names are a snapshot. Cross-check the Model snapshot for current model names and pricing.

In one line: Libraries that wrap the "loop until done" pattern around your LLM — state machines, tool routing, retries, checkpointing, and (sometimes) multi-agent orchestration.

In plain English

An "agent" is just an LLM in a loop that can call tools. The loop is fifteen lines of code. So why frameworks? Because the production version of those fifteen lines needs to survive a deploy, resume after a 429, trace every step, persist state between turns, and let a human approve sensitive actions. Frameworks give you those capabilities as primitives instead of you reimplementing them. The trade-off, as always, is hiding the loop you might have wanted to debug.

The major options (2026)

FrameworkLanguageModelMulti-agentCheckpointingBest for
LangGraphPy / TSState machineyesfirst-classComplex, production-grade agent flows
OpenAI Agents SDKPy / TSLinear with handoffsyes (handoffs)basicOpenAI-centric, clean primitives
Pydantic AIPyTyped agent loopyesyesTyped Python, structured tools
CrewAIPyRole-based teamsyes (core feature)yes"Team of agents" patterns
AutoGen / AG2PyConversational agentsyesyesResearch-flavored multi-agent
AgnoPyLightweightyesyesFaster, lighter CrewAI alternative
Vercel AI SDKTSLoop + toolspartialpartialTS apps; agent inside a Next.js route
MastraTSWorkflows + agentsyesyesTS-native LangGraph alternative
Smolagents (HF)PyCode-writing agentpartialbasicAgent that writes Python to act
DIY while loopanywhatever you writewhat you buildwhat you buildv0 always

Default pick for most teams

Write the loop yourself first. A working agent in 30 lines (while not done: response = llm.call(messages); if tool_call: execute; else: done = True) is what every framework is doing under the hood. Build it once so you understand it.

When you outgrow that — usually around "I need this to survive a deploy" or "I need to checkpoint mid-flow" — graduate to:

  • LangGraph for Python and serious complexity.
  • Pydantic AI for typed Python and simpler flows.
  • Vercel AI SDK for TypeScript apps where the agent lives behind a Next.js route.
  • OpenAI Agents SDK if you're already all-in on OpenAI and want their primitives.

When to deviate

  • Long-running flows with checkpoints and resume: LangGraph. Nothing else is as mature on persistent state.
  • Multi-agent role play (a "researcher" + "writer" + "editor" team): CrewAI or Agno.
  • Agent that needs to write and execute its own code: Smolagents + an agent runtime sandbox (see agent runtimes).
  • You want types everywhere and Pydantic on the tool args: Pydantic AI.
  • TypeScript-first stack: Vercel AI SDK or Mastra.
  • You want the simplest possible "handoff to specialist agent" pattern: OpenAI Agents SDK.

Minimum integration

DIY loop — what every framework hides:

def run_agent(user_msg: str) -> str:
messages = [{"role": "user", "content": user_msg}]
for _ in range(10): # max steps
r = llm.create(messages=messages, tools=TOOLS)
msg = r.choices[0].message
messages.append(msg)
if not msg.tool_calls:
return msg.content
for tc in msg.tool_calls:
result = execute_tool(tc.function.name, tc.function.arguments)
messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})
raise RuntimeError("max steps reached")

LangGraph — the same logic as a graph with checkpointing:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.postgres import PostgresSaver

graph = StateGraph(AgentState)
graph.add_node("llm", call_llm)
graph.add_node("tools", run_tools)
graph.add_conditional_edges("llm", lambda s: "tools" if s.tool_calls else END)
graph.add_edge("tools", "llm")
graph.set_entry_point("llm")

checkpointer = PostgresSaver(conn_string)
app = graph.compile(checkpointer=checkpointer)

# Resumable across deploys, restarts, transient failures
app.invoke({"messages": [...]}, config={"thread_id": "user-123"})

Vercel AI SDK — agent in a Next.js route:

import { generateText, tool } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const result = await generateText({
model: anthropic("claude-sonnet-4-6"),
tools: {
weather: tool({
description: "Get weather for a city",
parameters: z.object({ city: z.string() }),
execute: async ({ city }) => fetchWeather(city),
}),
},
maxSteps: 10,
messages,
});

What an agent framework actually buys you

  • Checkpointing: snapshot state to Postgres / Redis so a deploy doesn't kill mid-flight agents.
  • Tracing: every node, every tool call, every retry as a structured trace.
  • Retries and backoff at the right granularity (one tool fails ≠ kill the whole run).
  • Human-in-the-loop: pause on a sensitive tool, surface to a UI, resume on approval.
  • Multi-agent handoffs with a defined interface, not ad-hoc string passing.
  • Concurrency primitives (parallel tool calls, fan-out fan-in) that are awkward in raw code.

Multi-agent caveat

Most multi-agent frameworks make it easy to spin up six agents talking to each other. Easy ≠ a good idea. Each agent multiplies token cost, latency, and failure modes. The honest 2026 default is one strong agent with many tools; reach for multi-agent only when you have a real division-of-labor reason (parallel research, distinct skill domains, separation-of-concerns for safety). See the multi-agent foundations page before adopting one.

Pricing & cost notes

All major frameworks are open-source. Hosted control planes (LangSmith for LangGraph, LlamaCloud, etc.) add ~$0–$500/mo for small teams and usage-based pricing at scale. The real cost of agent frameworks is token spend — a 10-step agent on Sonnet can cost $0.20 per run before you notice. Budget per-conversation cost in your observability tool from day one.

Pitfalls

  • Reaching for a framework before you have a working while loop. You're hiding the thing you most need to understand.
  • Unbounded loops. Always set max_steps. The first runaway loop on Opus is a $50 wake-up call.
  • Multi-agent for everything. A single agent with the right tools is usually better, cheaper, and easier to debug.
  • Tools that the model can't tell apart. Two tools named search and lookup with overlapping descriptions = a confused agent. Make tool names and descriptions distinctive.
  • No checkpointing on a long-running flow. A deploy at minute 8 of a 10-minute run loses everything. Persist state.
  • Mixing checkpointing storage with operational data. Don't put agent state in the same Postgres table as your users. Separate schema or separate DB.
  • Trusting tool args without validation. The model will pass a stringified date when you asked for an int. Validate on the boundary (Pydantic / Zod).
🤔 Quick checkQuick check

→ Next: Vector databases