Skip to main content

Team and Process

In one line: A solo dev is the entire AI org; a startup has 1–10 AI engineers and a Slack channel; an enterprise has a platform team, feature teams, an AI Center of Excellence, and a written approval path for every model and prompt.

In plain English

The cleanest predictor of how fast a change ships isn't the model, the framework, or the cloud — it's how many humans have to nod before it goes live.

Solo: zero (you nod at yourself). Startup: one reviewer, maybe a PM glancing at the prompt diff. Enterprise: a code reviewer, a tech lead, an AI safety partner, a security partner, sometimes legal, sometimes a model risk committee. Every nod is also a wait.

The trick is matching the number of nods to the actual blast radius of being wrong.

Team shape

RoleSoloStartup AI TeamEnterprise AI
AI engineers1 (you)1–1050–500+ across feature teams
Platform / infra00–1 part-time10–30 on a dedicated AI platform team
MLOps / LLMOps00–1Dedicated function, 5–20 people
Prompt engineers / AI PMs00–2 hybrid rolesSpecialized, dozens
AI safety / red team00Dedicated function
Legal / privacy partner0Shared with rest of companyEmbedded per business unit
Model risk / governance00A committee (often required by regulators)
AI Center of Excellence (CoE)N/AN/AYes — sets standards, vets vendors, runs internal LLM days

The biggest cultural break is again between startup and enterprise. A startup AI team is generalists who ship end-to-end: the same person writes the prompt, builds the retrieval, wires the eval, and watches the dashboard. An enterprise splits those into specialized teams that have to coordinate via tickets.

For each column in depth, see Chapter 9: Solo, Chapter 10: Startup, and Chapter 11: Enterprise.

Decision style

DecisionSoloStartupEnterprise
Pick a modelWhim, vibes-test on 5 promptsEval suite + cost spreadsheetBake-off + bias evals + vendor security review + procurement (3–9 months)
Tweak a promptEdit + pushPR, one reviewer, evals in CIPR + prompt registry update + peer review + (high-risk tier) AI safety review
Add a tool to an agentJust add itPR + eval that exercises itThreat model + permission scoping review + audit log plumbing
Move from RAG to fine-tuneSaturday experimentSprint with stakeholder buy-inCross-team RFC, data governance review, training-data lineage doc
Change embedding modelRe-embed Saturday nightMigration plan + dual-index for a week + cohort cutoverQuarter-long program, vendor re-vetting, compliance impact assessment
Add a new providerSign up with credit cardAdd a row to the gateway configVendor onboarding committee + security review + DPIA + contract negotiation
IdeaSolo: ship(0 humans, minutes)Startup: PR + 1review(2 humans, hours)Enterprise: designdoc → review →registry → rollout(8–20 humans, weeks)
Highlight: the prompt registry is the dividing line

The single artifact that most clearly separates startup from enterprise AI work is the prompt registry — a versioned, reviewed, audited store of every prompt running in production.

At solo and startup scale, prompts live in code; the git log is the registry. At enterprise scale, prompts have to be discoverable across teams, attributable to an owner, auditable for compliance, and changeable without a code deploy. That's a whole platform, often built in-house, sometimes 5+ engineers' full-time job.

If you're at startup scale and someone proposes building a prompt registry "to be ready," ask which specific upcoming customer or regulator requires it. Usually the answer is none, and you've just lit three months on fire.

Time-from-idea-to-live, by change type

ChangeSoloStartupEnterprise
Typo in a prompt90 seconds30 minutes1–2 days (or 1–6 weeks for High-risk tier)
New tool for an agent30 minutes1–3 days4–8 weeks
New AI featureHours to days2 weeks1–3 months
Replace primary modelAn evening1–2 weeks (gateway + evals + cohort rollout)1–2 quarters (re-vet vendor, re-run evals, change management)
Adopt a new providerSame dayA week3–9 months
Worked example: same prompt tweak, three orgs

A user reports the chatbot misses a follow-up question 1 in 5 times. Engineer wants to add one sentence to the system prompt.

  • Solo: opens system_prompt.md, adds the sentence, runs Promptfoo locally, ships. Total: 5 minutes. 1 human (themselves) involved.
  • Startup: opens PR with the diff, eval suite runs in CI on the standard 200-case set, one teammate reviews, merge auto-deploys behind a feature flag, gradual rollout over 24 hours, watches Langfuse dashboard. Total: a day, ~30 active minutes. 2 humans involved.
  • Enterprise: opens PR, prompt registry entry updated, tech lead reviews, AI safety partner checks change is Low risk (it is), evals run on 5,000-case battery + bias suite, change goes into the next deploy window, canary at 1% for 6 hours, then 10%, then 100% over a week, change advisory board sees the entry on their weekly digest. Total: 1–2 weeks. 8+ humans involved.

The actual thinking — "add this sentence" — took the same 5 minutes everywhere. Everything else is risk-absorption process, priced against the cost of being wrong.

What stays the same / what changes

Stays the same: every column has prompts in version control. Every column runs some eval before shipping. Every column has someone who can press a kill switch.

Changes: the number of nods between idea and live, the formality of the eval bar, the blast radius a kill switch protects against. At solo scale a kill switch protects you from your own bill; at enterprise scale it protects against an SEC filing.

Stakeholders per change

A useful proxy for "which column am I in?" is how many distinct humans get involved in a typical AI change.

ChangeSoloStartupEnterprise
Prompt tweak12 (author + reviewer)4–8 (author + reviewer + tech lead + AI safety + sometimes legal)
New AI feature14–6 (PM + AI engineer + reviewer + designer + CS lead)10–20 (above + AI safety partner + security partner + privacy + product council + pilot customers)
New provider onboarding12–315+ across procurement, legal, security, AI CoE, platform team
Production incident13–5 (on-call + AI eng + PM + CS)10–30 across incident command, comms, legal, executive, sometimes regulators

The stakeholder count is the cleanest single predictor of calendar time. Each additional stakeholder is roughly a +1 day of wait time at startup scale and +1 week at enterprise scale, even when everyone moves fast.

Common mistakes

  • Importing enterprise governance into a 3-person AI team. A prompt registry, AI safety partner, and risk-tier review for a 3-engineer team isn't "raising the bar" — it's a tax that hands the market to the competitor still on git + Promptfoo.
  • Believing "we have evals" means "we have process." A test suite is a check; a process is who runs it, when, and what blocks if it fails. The startup that runs evals locally but skips them in CI on Fridays is solo with extra steps.
  • Romanticizing the platform team. From the outside, an enterprise AI platform team looks like leverage. From the inside it's a permanent ticket queue. Don't build one until feature teams are actively blocked on shared infra they can't build themselves.
  • Confusing AI CoE with AI velocity. A Center of Excellence is a coordination function for an org with dozens of independent AI efforts. At one or two efforts, calling a Slack thread a CoE doesn't speed anything up — it just adds meetings.
  • Hiring a head of AI before the second AI engineer. A "head of AI" with no team is a job title looking for a problem; for the first one or two AI engineers, the right manager is whoever is already managing the relevant product team. Add the dedicated leadership role when the coordination cost across multiple AI engineers actually exceeds one engineering manager's attention.
🤔 Quick checkQuick check

→ Next: Stack comparison.