A Realistic Cost Breakdown
In one line: A 20-person AI-first startup typically spends $50K–$200K/month on infrastructure and tools, with provider API calls as the single largest variable line.
At pure SaaS scale, infrastructure is noise next to payroll. At AI-first scale, infrastructure (especially LLM API spend) is a large line item — sometimes the largest after salaries. A 20-person AI startup spending $150K/month on provider APIs is normal, not alarming. The conversation shifts from "is the bill too high" to "is per-tenant $/answer healthy."
A 20-person AI startup at ~$3M ARR
| Category | Item | Monthly cost | Notes |
|---|---|---|---|
| LLM providers | Anthropic primary | $25K–$120K | Single largest variable line |
| OpenAI fallback + embeddings | $5K–$30K | Embeddings + fallback traffic | |
| Bedrock for compliance customers | $0–$15K | Optional, regulated customers | |
| Gateway | Portkey | $300–$2K | Pays for itself first outage |
| Eval + observability | Braintrust OR Langfuse | $500–$3K | Pick one |
| Sentry | $100–$500 | ||
| Datadog OR Better Stack | $500–$3K | Logs + metrics + uptime | |
| Vector / DB | Supabase Pro (incl. pgvector) | $500–$3K | Until you outgrow |
| Pinecone / Turbopuffer (if used) | $0–$8K | At-scale only | |
| Hosting | Vercel Team / Pro | $500–$3K | Scales with bandwidth + function time |
| Modal / Render for Python workers | $500–$5K | If you have Python workers | |
| Background | Inngest / Trigger.dev | $200–$2K | |
| Product / flags | PostHog or Statsig | $0–$2K | Generous free tiers |
| Auth | Clerk | $200–$2K | Per-MAU pricing |
| Resend | $100–$500 | ||
| Dev tools | GitHub Team | $80 (20 users) | |
| Linear | $160 (20 users) | ||
| Doppler / 1Password | $200–$400 | Secrets | |
| Cursor or Claude Code teams | $400–$1K | AI dev tools (yes, your devs pay this) | |
| Compliance | Vanta / Drata / Secureframe | $400–$1.5K | Once SOC 2 starts |
| Auditor fees | ~$2K/mo amortized | $25K Type II audit / year | |
| Misc | Domain, monitoring extras | $50–$300 | |
| Total | $50K–$200K | Provider $ swings the range |
For comparison: 20 engineers fully loaded is roughly $300K–$500K/month. Infrastructure is 10–40% of payroll at AI-first scale — much higher than the 1–5% typical at pure SaaS scale.
Where the variance comes from
The $50K–$200K range is wide because the LLM provider line dominates and scales with usage:
- Quiet feature, mid-tier model, light context: ~$0.001/answer.
- Heavy retrieval, flagship model, large context: ~$0.10/answer.
- Agent loop with tool calls and re-prompting: ~$1+/answer.
A startup running 500K answers/day at $0.05/answer is at $25K/day on providers alone — and that's a single, busy feature. The discipline is knowing your $/answer per feature and treating it like a product KPI.
Healthy cost ratios
At ~$3M ARR with $100K/month infra spend:
- Infrastructure = ~40% of MRR. Tight, but normal for AI-first startups in early scale.
- Provider spend should be ~50–70% of total infra spend (rest = obs, hosting, dev tools).
- Per-feature cost should be tracked weekly; no feature should silently 2x without explanation.
A startup at 60%+ infra-to-MRR is in trouble; one at <20% is either pricing wrong or under-shipping AI value.
The big levers (in order of impact)
- Model selection per feature. Routing routine work to mid-tier models often saves 60%+ on that feature.
- Prompt caching. Anthropic's prompt caching can cut 50–80% off repeated-context features. Free win if structured right.
max_tokensceilings. A surprising number of features have no ceiling and bleed money on overlong outputs.- Semantic caching at the gateway. Identical or near-identical queries served from cache. ~20–40% savings on chat features.
- Per-tenant caps. Free tier and lowest paid tier need usage caps. A handful of abusive users can 10x your bill.
- Batch APIs. For non-realtime workloads (overnight summarization, etc.), Anthropic/OpenAI batch APIs are ~50% cheaper.
When to invest engineering time in cost reduction
A reasonable rule:
- If a cost-reduction project saves > $5K/month and takes < 2 engineer-weeks, do it now.
- If it saves $1–5K/month, queue it for the next sprint.
- If it saves < $1K/month, ignore — the engineer time costs more than the savings.
Track "hours spent saving dollars" and you'll catch yourself optimizing the cheap bills.
A 22-person AI startup's provider bill grows from $48K to $72K over four months. Cofounders nervous. The AI engineer does a one-week audit:
- Summarization feature: on flagship model. Eval shows mid-tier is within 1 point of flagship. Switch → saves $9K/month.
- Customer-support assistant: repeated long system prompt + retrieval context. Enable Anthropic prompt caching → saves $7K/month.
- Two abusive free-tier tenants: running scripts against the API. Per-tenant cap on free tier → saves $3K/month.
- Background tagging: moved to batch API → saves $4K/month.
Total: $23K/month savings, ~$280K/year. One engineer-week of work.
The lesson: cost optimization, like eval discipline, is a quarterly muscle. Once a year is too rarely; every sprint is too often.
The instinct from pure SaaS — "the database is the big bill" — is wrong in AI startups. Postgres is $1K; the provider bill is $80K. Engineering attention should follow the bill. Most AI engineers ignore provider spend until it's a fire. Build the dashboards on day one and check them weekly.
A real-world line-item walkthrough
A 22-person AI startup at $3.2M ARR, October 2026, monthly bill (representative numbers):
| Line | Spend |
|---|---|
| Anthropic (primary, claude-sonnet-4.5) | $78,000 |
| OpenAI (fallback + text-embedding-3) | $14,200 |
| Bedrock (one HIPAA customer) | $6,800 |
| Portkey gateway | $890 |
| Braintrust (evals + traces) | $1,400 |
| Sentry | $310 |
| Datadog | $1,600 |
| Supabase Pro + pgvector | $1,100 |
| Vercel Team | $1,250 |
| Modal (Python workers, batch embed) | $2,100 |
| Inngest | $480 |
| PostHog (flags + analytics) | $0 (free tier still) |
| Clerk auth | $620 |
| Resend | $180 |
| GitHub + Linear + Doppler + Cursor | $1,250 |
| Vanta + amortized auditor | $2,100 |
| Misc | $200 |
| Total | $112,480 |
That's ~42% of MRR. Provider $ (Anthropic + OpenAI + Bedrock) = $99K = 88% of the infra bill. Everything else is small.
When the provider bill becomes a hiring decision
There's an inflection where the provider bill is large enough that one engineer dedicated to cost pays for itself. Rough rule:
- Bill > $50K/month → quarterly cost reviews are enough.
- Bill > $100K/month → monthly cost reviews + named owner part-time.
- Bill > $250K/month → one dedicated engineer on cost optimization full-time pays for themselves within 2 quarters.
- Bill > $500K/month → small platform team focused on cost + reliability.
The engineer's job: model selection per feature, prompt caching coverage, batch API migration, per-tenant caps, contract renegotiation with providers.
Common mistakes
- No per-feature cost dashboard. "The bill went up" is unhelpful. "Feature X went up 80% week-over-week" is actionable.
- No per-tenant cost view. A single abusive user can 5x your bill in a day. You need to see them.
- Treating LLM cost as fixed. It's the most variable cost in the stack and the most optimizable.
- Not setting alerts. First time you'll hear about runaway usage is the monthly invoice — two weeks late. Set alerts at 2x, 5x, 10x normal.
- Spending an engineer-week to save $200/month. A senior engineer's time is ~$1,500/day. Cost-optimization projects must clear a "saves > $5K/month" bar.
What's next
→ Continue to Day in the Life for a worked day in the life of an AI engineer at a 20-person startup.