AI gateways

Dated content — June 2026

This page names specific tools, models, and prices, which rotate quarterly. The selection logic is durable; the names are a snapshot. Cross-check the Model snapshot for current model names and pricing.

In one line: A reverse proxy in front of LLM providers. You call the gateway; the gateway calls the provider. In return you get routing, fallback, caching, rate limiting, cost tracking, PII redaction, and one bill.

In plain English

A gateway sits between your application and every LLM provider. To your app, it looks like one URL with one API key. Behind it, the gateway can route to Anthropic, OpenAI, Google, your self-hosted vLLM, or all of them in some priority order. You stop hard-coding provider names; you start declaring policies ("send classification to Haiku; fail over to Mini if Haiku is down; cache identical prompts for 60s"). Most teams add one within 6–12 months of going to production.

The major options (2026)

Gateway	Hosting	OSS?	Routing	Caching	PII	Best for
Portkey	Hosted + self-host	partial	rules + fallback	yes	yes	Full-featured production default
OpenRouter	Hosted only	no	model-marketplace	partial	basic	One key, hundreds of models, unified billing
LiteLLM Proxy	Self-host	yes	rules + fallback	yes	partial	OSS default, OpenAI-compatible facade
Cloudflare AI Gateway	Hosted	no	basic	yes	basic	Cheap, fast, already on Cloudflare
Helicone	Hosted + OSS	partial	rules	yes	yes	Started as obs; now also gateway
Kong AI Gateway	Self / hosted	yes (CE)	rules + plugins	yes	yes	Enterprise; existing Kong shops
Apigee AI	Google Cloud	no	enterprise rules	yes	yes	Enterprise on GCP
AWS Bedrock	AWS only	no	one-API for many models	partial	via Guardrails	All-AWS shops
Vercel AI Gateway	Hosted	no	basic	yes	basic	Vercel-deployed apps

Default pick for most teams

Portkey if you want managed, LiteLLM Proxy if you want self-hosted and free. Both speak the OpenAI-compatible API shape, support fallback and routing rules, and give you cost dashboards out of the box.

If you just want "one API key for every model" without much else: OpenRouter — fewer features but the lowest friction onboarding in the category.

When to deviate

You're a developer trying every model under the sun: OpenRouter — the model marketplace and unified billing are the killer feature.
You're entirely on AWS / Azure / GCP and want no extra vendor: Bedrock, Azure OpenAI, or Vertex AI with Apigee.
You're already running Kong for your other APIs: Kong AI Gateway — same control plane.
Cheap and basic is fine (mostly observability + cache): Cloudflare AI Gateway at near-zero cost.
You want gateway + observability + prompt mgmt + evals in one tool: Portkey or Helicone.

What a gateway gives you

Provider failover. Anthropic 503 → automatic retry on OpenAI. One incident response, one less middle-of-night page.
Routing rules. Cheap model for short prompts; flagship for long; specific model for code; pick by tenant tier.
Caching. Identical prompt → cached response, often at 1% of the original cost. Especially for system prompts and FAQ-style queries.
Rate limits per user / tenant / API key. Stops one runaway customer from burning your whole budget.
Single billing surface across providers. Procurement loves this.
PII / secrets scrubbing at the proxy layer, before requests leave your perimeter.
Centralized observability for every team's LLM calls — even teams that didn't wire up Langfuse.
Semantic caching. Beyond exact-match: cache "What's our refund policy?" against a previously-cached "What is your refund policy?".

Minimum integration

LiteLLM Proxy — self-hosted gateway in 20 lines of config:

# config.yaml
model_list:
  - model_name: workhorse
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY
  - model_name: workhorse
    litellm_params:
      model: openai/gpt-5.1
      api_key: os.environ/OPENAI_API_KEY

router_settings:
  routing_strategy: simple-shuffle
  fallbacks: [{"workhorse": ["openai/gpt-5.1"]}]

litellm --config config.yaml --port 4000

# Your app — point at the gateway, use any provider name
from openai import OpenAI
client = OpenAI(api_key="sk-anything", base_url="http://localhost:4000")
client.chat.completions.create(model="workhorse", messages=[...])

Portkey — hosted, headers route the request:

from openai import OpenAI
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="https://api.portkey.ai/v1",
    default_headers={
        "x-portkey-api-key": os.environ["PORTKEY_API_KEY"],
        "x-portkey-config": "cfg-routing-and-cache",
    },
)

When to add a gateway

You use 2+ providers in production.
You need centralized cost control across teams.
You want a single audit log of every LLM call your org makes.
You have regulated workloads that need centralized PII redaction.
You're managing per-tenant rate limits and don't want to reinvent.

You don't need a gateway on day one. Most teams add one in months 3–12 once they have a real reason.

Pricing & cost notes

Gateway	Cost model
Portkey	Free tier; ~$49+/mo paid; usage on top
OpenRouter	Provider price + ~5.5% markup; pay-as-you-go
LiteLLM Proxy	Free OSS; your hosting cost
Cloudflare AI Gateway	Free at low volume; cheap thereafter
Helicone	Free 100k req; $25+/mo
Kong AI Gateway CE	Free OSS; Enterprise tiers $$$$
Bedrock / Vertex AI	Provider price; no markup, in-cloud egress free

Gateway cost is small relative to provider spend. The real value is the cache hit rate (often 20–40%) and the incident-avoidance when one provider hiccups.

Pitfalls

Adding a gateway before you have one provider working. Premature abstraction; you'll over-design for problems you don't have.
Caching responses that include user-specific content. Cache "What is the policy on returns?" not "Email Jane about her return.". Per-user cache keys or no cache.
Trusting fallback to be free. If both providers are slow today, you've now waited for both. Set tight timeouts on the primary.
Gateway in the same region as none of your providers. Latency adds up. Pick a region close to whichever provider you call most.
No circuit breaker. Repeated 503s from Anthropic shouldn't keep retrying for 30 seconds; trip the breaker, route around.
Provider creds in app code AND gateway. Pick one place to hold credentials. Usually the gateway.
Gateway as a single point of failure. Run two instances; health-check; have a way to bypass to direct provider calls during a gateway outage.

🤔 Quick checkQuick check

→ Next: Orchestration

The major options (2026)​

Default pick for most teams​

When to deviate​

What a gateway gives you​

Minimum integration​

When to add a gateway​

Pricing & cost notes​

Pitfalls​