Closed-source providers
This page names specific tools, models, and prices, which rotate quarterly. The selection logic is durable; the names are a snapshot. Cross-check the Model snapshot for current model names and pricing.
In one line: Hosted, API-only foundation models. You POST messages, you get tokens. Lowest operational burden, highest frontier quality, and what 90% of teams ship on in 2026.
A "closed provider" is a company that trained a giant model, hides the weights, and rents you access through an HTTP endpoint. You don't manage GPUs, you don't manage scaling, you don't manage model files. You send a request and get an answer. Anthropic, OpenAI, Google, and xAI are the big four. The trade-off is that you can never run their best models on your own hardware — you're a tenant, not an owner.
The major four (May 2026)
| Provider | Flagship | Workhorse | Cheap tier | Context window | Notable strength |
|---|---|---|---|---|---|
| Anthropic | Claude Opus 4.7 | Claude Sonnet 4.6 | Claude Haiku 4.5 | 200k–1M | Coding, reasoning, agentic tool use |
| OpenAI | GPT-5.1 / o4 | GPT-5.1 mini | GPT-5.1 nano | 400k | Ecosystem breadth, Realtime, image-gen |
| Gemini 2.5 Pro | Gemini 2.5 Flash | Gemini Flash-Lite | 1M–2M | Long context, multimodal, price | |
| xAI | Grok 4 | Grok 4 mini | — | 256k | X-data integration, cost |
Versions drift quarterly. Treat names as snapshots; what stays true is the tier shape — every major provider ships flagship / workhorse / cheap.
Default pick for most teams
Claude Sonnet for the workhorse, Claude Haiku for cheap, GPT-5.1 as the fallback. This combo gives you frontier-class reasoning, the cheapest competent classifier in the market, and a different provider to fail over to when Anthropic has a bad afternoon.
If you're already deeply inside Google Cloud or need 1M+ token contexts, swap the default to Gemini 2.5 Pro as your workhorse and Sonnet as the fallback.
When to deviate
- Frontier-only quality needed (hard reasoning, legal/medical, complex agents): upgrade workhorse to Claude Opus or OpenAI o4.
- Very long context (whole books, full codebases, multi-hour transcripts): Gemini 2.5 Pro — nothing else competes at the 1M+ end.
- Cheapest possible classification at high volume: Gemini Flash-Lite or Haiku — both are ~$0.25–$1 per million input tokens.
- You ship inside the X / Grok ecosystem or want native access to real-time X data: Grok 4.
- Compliance / data residency rules out US hyperscalers: regional offerings from Mistral, Aleph Alpha, or one of the closed providers via Azure / Bedrock / Vertex in the right region.
Minimum integration
# Anthropic — three lines to a working call
from anthropic import Anthropic
client = Anthropic() # reads ANTHROPIC_API_KEY from env
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Summarize the Bohr model in one sentence."}],
)
print(response.content[0].text)
// OpenAI — same shape, TypeScript
import OpenAI from "openai";
const client = new OpenAI(); // reads OPENAI_API_KEY
const r = await client.chat.completions.create({
model: "gpt-5.1",
messages: [{ role: "user", content: "Summarize the Bohr model in one sentence." }],
});
console.log(r.choices[0].message.content);
That's the entire SDK story for the simple case. Everything else — streaming, tools, structured output, caching — is additive flags on this shape.
Pricing & cost notes (May 2026 ballpark)
| Tier | Input ($/Mtok) | Output ($/Mtok) | Use it for |
|---|---|---|---|
| Flagship (Opus, GPT-5.1, Gemini 2.5 Pro) | $5–$15 | $15–$75 | Hard reasoning, agents, code |
| Workhorse (Sonnet, mini) | $1–$3 | $5–$15 | Most product features |
| Cheap (Haiku, nano, Flash-Lite) | $0.10–$0.50 | $0.40–$2 | Classification, extraction, batch |
Three discipline tricks that pay for themselves immediately:
- Prompt caching. Anthropic and OpenAI both bill cached prefixes at ~10% of fresh tokens. If your system prompt is 4k tokens, cache it.
- Batch APIs. 50% off if you can tolerate up to 24h latency — see the batch inference page.
- Tier routing. Don't call Opus to format a date. Cheap tier for cheap tasks.
Why not pick one and forget it?
- Outages. Every provider has had a multi-hour bad day in the last 12 months. Have a fallback wired up.
- Model drift. Even within the same model name, behaviors shift between minor versions. Multi-provider gives you A/B leverage.
- Cost arbitrage. Sonnet for prod, Haiku for batch, Gemini Flash for classification can cut spend 60%+.
- Capability gaps. OpenAI ships Realtime voice; Anthropic ships the best tool-use; Gemini ships 2M context. You will eventually want all three.
Tools that abstract the "many providers" problem: Portkey, OpenRouter, LiteLLM, Vercel AI SDK — covered in LLM SDKs and AI gateways.
Pitfalls
- Hard-coding a model name in 50 places. Use a constant. When
claude-sonnet-4-6is deprecated, you want one diff, not fifty. - Treating provider SDKs as interchangeable at the response shape. They are not. Anthropic returns
content: [{type:'text', text:...}]; OpenAI returnschoices[0].message.content. Abstract this once or your code will rot. - No spend limit on the provider dashboard. A loop that retries on failure can drain $50k in an hour. Set hard caps before you ship.
- Putting the API key in a frontend env var.
NEXT_PUBLIC_OPENAI_API_KEYis a way to get your account drained. Always proxy through your backend. - Trusting marketing benchmarks. Vendor leaderboards are gamed. Run your own eval suite — see eval tools — before you bet your roadmap on a model.
- Ignoring rate limits until production. New accounts have low TPM caps. Request increases before launch day, not during the outage.
→ Next: Open-weight models