Skip to main content

Model & pricing snapshot

Last updated: June 2026

This is the guide's one deliberately volatile page. Model names, version numbers, context windows, and per-token prices change quarterly; the concepts in the rest of the guide do not. When a lesson says "frontier tier" or "workhorse pricing," the current names and numbers live here. If this page's date looks old, check the provider dashboards — the tier shape below will still be right even when the names aren't.

The three tiers — current names

TierAnthropicOpenAIGoogleOpen weights
FrontierClaude Opus 4.7GPT-5 / 5.1, o4Gemini 2.5 Ultra / Deep ThinkDeepSeek R1, Llama 4 (largest)
WorkhorseClaude Sonnet 4.6GPT-5.1 miniGemini 2.5 ProLlama-3.3-70B, Qwen-2.5-72B
Small / cheapClaude Haiku 4.5GPT-5.1 nanoGemini 2.5 Flash / Flash-LiteLlama-3.3-8B, Phi-4, Qwen-2.5-7B

Per-token prices (ballpark, $/1M tokens)

TierInputOutputTypical latency
Frontier$3–$15$15–$751–5s TTFT (slower with reasoning)
Workhorse$0.30–$3$1.50–$15300ms–1s TTFT, 80–200 tok/s
Small$0.05–$0.50$0.20–$2sub-200ms TTFT, 200–500+ tok/s

Durable ratios (these survive every price cut): frontier ≈ 4–10× workhorse ≈ 50–100× cheap, per token. Prompt-cached prefixes bill at ~10% of fresh tokens; batch APIs run ~50% off.

The major closed providers

ProviderFlagshipWorkhorseCheapContextNotable strength
AnthropicClaude Opus 4.7Claude Sonnet 4.6Claude Haiku 4.5200k–1MCoding, reasoning, agentic tool use
OpenAIGPT-5.1 / o4GPT-5.1 miniGPT-5.1 nano400kEcosystem breadth, Realtime voice, image-gen
GoogleGemini 2.5 ProGemini 2.5 FlashFlash-Lite1M–2MLong context, multimodal, price
xAIGrok 4Grok 4 mini256kX-data integration, cost

Reasoning models

OpenAI o-series (o3 / o4 / o4-mini), Claude extended thinking, Gemini 2.5 Deep Think, DeepSeek R1, Qwen3-Reasoner. Billed thinking tokens typically add 1K–30K tokens and 5–60s of latency per answer.

Open-weight landscape

Meta (Llama 3.3 / 4), Mistral (Large 2, Codestral), Alibaba (Qwen 2.5 / 3.0), DeepSeek (V3, R1), Microsoft (Phi-4), Cohere (Command). Quality gap to the closed frontier: roughly ~3 months on most benchmarks as of this snapshot. Fast managed inference: Groq, Cerebras, Fireworks, Together (Groq/Cerebras reach 1000+ tok/s on small models).

Embedding models

OpenAI text-embedding-3-large/small, Cohere embed-v4, Voyage voyage-3, open-weight bge-m3 / gte families. Typical price: $0.02–$0.13 / 1M tokens.


How to use this page: lessons link here whenever they need a concrete model name or price. If you're reading a lesson that names a model without a date, treat the name as an example of its tier, and check this page for what currently occupies that tier.

→ Concepts behind this table: Model families · Closed providers · Open models