Skip to main content

AI gateways

Dated content — June 2026

This page names specific tools, models, and prices, which rotate quarterly. The selection logic is durable; the names are a snapshot. Cross-check the Model snapshot for current model names and pricing.

In one line: A reverse proxy in front of LLM providers. You call the gateway; the gateway calls the provider. In return you get routing, fallback, caching, rate limiting, cost tracking, PII redaction, and one bill.

In plain English

A gateway sits between your application and every LLM provider. To your app, it looks like one URL with one API key. Behind it, the gateway can route to Anthropic, OpenAI, Google, your self-hosted vLLM, or all of them in some priority order. You stop hard-coding provider names; you start declaring policies ("send classification to Haiku; fail over to Mini if Haiku is down; cache identical prompts for 60s"). Most teams add one within 6–12 months of going to production.

The major options (2026)

GatewayHostingOSS?RoutingCachingPIIBest for
PortkeyHosted + self-hostpartialrules + fallbackyesyesFull-featured production default
OpenRouterHosted onlynomodel-marketplacepartialbasicOne key, hundreds of models, unified billing
LiteLLM ProxySelf-hostyesrules + fallbackyespartialOSS default, OpenAI-compatible facade
Cloudflare AI GatewayHostednobasicyesbasicCheap, fast, already on Cloudflare
HeliconeHosted + OSSpartialrulesyesyesStarted as obs; now also gateway
Kong AI GatewaySelf / hostedyes (CE)rules + pluginsyesyesEnterprise; existing Kong shops
Apigee AIGoogle Cloudnoenterprise rulesyesyesEnterprise on GCP
AWS BedrockAWS onlynoone-API for many modelspartialvia GuardrailsAll-AWS shops
Vercel AI GatewayHostednobasicyesbasicVercel-deployed apps

Default pick for most teams

Portkey if you want managed, LiteLLM Proxy if you want self-hosted and free. Both speak the OpenAI-compatible API shape, support fallback and routing rules, and give you cost dashboards out of the box.

If you just want "one API key for every model" without much else: OpenRouter — fewer features but the lowest friction onboarding in the category.

When to deviate

  • You're a developer trying every model under the sun: OpenRouter — the model marketplace and unified billing are the killer feature.
  • You're entirely on AWS / Azure / GCP and want no extra vendor: Bedrock, Azure OpenAI, or Vertex AI with Apigee.
  • You're already running Kong for your other APIs: Kong AI Gateway — same control plane.
  • Cheap and basic is fine (mostly observability + cache): Cloudflare AI Gateway at near-zero cost.
  • You want gateway + observability + prompt mgmt + evals in one tool: Portkey or Helicone.

What a gateway gives you

  • Provider failover. Anthropic 503 → automatic retry on OpenAI. One incident response, one less middle-of-night page.
  • Routing rules. Cheap model for short prompts; flagship for long; specific model for code; pick by tenant tier.
  • Caching. Identical prompt → cached response, often at 1% of the original cost. Especially for system prompts and FAQ-style queries.
  • Rate limits per user / tenant / API key. Stops one runaway customer from burning your whole budget.
  • Single billing surface across providers. Procurement loves this.
  • PII / secrets scrubbing at the proxy layer, before requests leave your perimeter.
  • Centralized observability for every team's LLM calls — even teams that didn't wire up Langfuse.
  • Semantic caching. Beyond exact-match: cache "What's our refund policy?" against a previously-cached "What is your refund policy?".

Minimum integration

LiteLLM Proxy — self-hosted gateway in 20 lines of config:

# config.yaml
model_list:
- model_name: workhorse
litellm_params:
model: anthropic/claude-sonnet-4-6
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: workhorse
litellm_params:
model: openai/gpt-5.1
api_key: os.environ/OPENAI_API_KEY

router_settings:
routing_strategy: simple-shuffle
fallbacks: [{"workhorse": ["openai/gpt-5.1"]}]
litellm --config config.yaml --port 4000
# Your app — point at the gateway, use any provider name
from openai import OpenAI
client = OpenAI(api_key="sk-anything", base_url="http://localhost:4000")
client.chat.completions.create(model="workhorse", messages=[...])

Portkey — hosted, headers route the request:

from openai import OpenAI
client = OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
base_url="https://api.portkey.ai/v1",
default_headers={
"x-portkey-api-key": os.environ["PORTKEY_API_KEY"],
"x-portkey-config": "cfg-routing-and-cache",
},
)

When to add a gateway

  • You use 2+ providers in production.
  • You need centralized cost control across teams.
  • You want a single audit log of every LLM call your org makes.
  • You have regulated workloads that need centralized PII redaction.
  • You're managing per-tenant rate limits and don't want to reinvent.

You don't need a gateway on day one. Most teams add one in months 3–12 once they have a real reason.

Pricing & cost notes

GatewayCost model
PortkeyFree tier; ~$49+/mo paid; usage on top
OpenRouterProvider price + ~5.5% markup; pay-as-you-go
LiteLLM ProxyFree OSS; your hosting cost
Cloudflare AI GatewayFree at low volume; cheap thereafter
HeliconeFree 100k req; $25+/mo
Kong AI Gateway CEFree OSS; Enterprise tiers $$$$
Bedrock / Vertex AIProvider price; no markup, in-cloud egress free

Gateway cost is small relative to provider spend. The real value is the cache hit rate (often 20–40%) and the incident-avoidance when one provider hiccups.

Pitfalls

  • Adding a gateway before you have one provider working. Premature abstraction; you'll over-design for problems you don't have.
  • Caching responses that include user-specific content. Cache "What is the policy on returns?" not "Email Jane about her return.". Per-user cache keys or no cache.
  • Trusting fallback to be free. If both providers are slow today, you've now waited for both. Set tight timeouts on the primary.
  • Gateway in the same region as none of your providers. Latency adds up. Pick a region close to whichever provider you call most.
  • No circuit breaker. Repeated 503s from Anthropic shouldn't keep retrying for 30 seconds; trip the breaker, route around.
  • Provider creds in app code AND gateway. Pick one place to hold credentials. Usually the gateway.
  • Gateway as a single point of failure. Run two instances; health-check; have a way to bypass to direct provider calls during a gateway outage.
🤔 Quick checkQuick check

→ Next: Orchestration