Skip to main content

Orchestration

Dated content — June 2026

This page names specific tools, models, and prices, which rotate quarterly. The selection logic is durable; the names are a snapshot. Cross-check the Model snapshot for current model names and pricing.

In one line: Durable, resumable, observable execution engines. LLM workflows can take minutes (agents, batch jobs, long-context generation); a single HTTP request can't survive that. Workflow engines can.

In plain English

A normal request is "you call my endpoint, I do something, I return a response." That fits inside one HTTP timeout. An agent might take three minutes; a batch summarization job might take two hours. If your server restarts mid-flow, you lose everything you've spent on tokens. A workflow engine persists every step to durable storage so the work survives deploys, crashes, and provider rate limits. When the LLM 429s, the workflow waits, retries, and picks up where it left off.

The major options (2026)

EngineLanguageHostingStyleBest for
InngestTS-first; PyCloud + OSS dev kitDurable functions, event-drivenTS-first teams; great DX
Trigger.devTS-firstCloud + self-hostDurable jobsTS apps; simpler than Temporal
TemporalPy / Go / TS / JavaCloud + self-hostHeavyweight, language-agnosticProduction-grade, large teams
RestateTS / Py / JavaCloud + self-hostDurable + stateful entitiesAgentic patterns, virtual actors
HatchetPy / TS / GoOSS / CloudPostgres-backedSimpler than Temporal, self-hostable
PrefectPythonCloud + OSSData-eng flavoredPython data + AI pipelines
DagsterPythonCloud + OSSAsset-orientedData + ML pipelines
ModalPythonHostedServerless functions + queuesPython AI workloads; GPU-friendly
AWS Step FunctionsanyAWSCloud-native state machineAll-AWS shops
GCP Workflows / Azure Logic Appsanytheir cloudsCloud-nativeCloud-locked teams

Default pick for most teams

  • TypeScript stack: Inngest. The DX is the best in the category, the OSS dev kit runs on your laptop, durable functions are just step.run(), and the cloud tier is generous.
  • Python AI stack: Modal for the "GPU + workflow" combo, Inngest Python or Hatchet for general-purpose, Temporal when you've hit enterprise scale.
  • You want one tool that runs anywhere: Temporal. Heaviest, also most battle-tested.

When to deviate

  • Long-running agents that need to checkpoint mid-flow and resume on a new container: Temporal, Restate, or Inngest — anything with first-class durable execution.
  • Agentic patterns with per-session state (virtual actors): Restate is uniquely good here.
  • Batch / scheduled / cron AI jobs at scale: Inngest cron, Modal scheduled functions, or Prefect for data-pipeline flavor.
  • Already on a cloud platform's primitives: Step Functions, GCP Workflows, Azure Logic Apps — extra infra avoided.
  • Heavy GPU work + workflow in one tool: Modal. Hosted vLLM, batch fine-tuning, scheduled embeddings — all in one Python decorator.

Why orchestration matters for AI

  • Survives deploys, restarts, transient failures. Long-running agent on minute 8 doesn't lose state when the container restarts.
  • Resumable workflows — pick up where left off when the LLM provider returns 429.
  • Step-level retries with exponential backoff — step.run("call-llm", llm_call) retries that step only.
  • Observability for free — every step is a span, every retry is recorded.
  • Human-in-the-loop — pause for approval; resume on the click. The workflow waits cheaply.
  • Concurrency control — "no more than 5 concurrent embedding jobs per tenant."
  • Idempotency — re-running a workflow with the same ID picks up the cached result rather than re-paying.

Minimum integration

Inngest — TypeScript durable function:

import { inngest } from "./client";

export const summarizeDoc = inngest.createFunction(
{ id: "summarize-doc", retries: 3 },
{ event: "doc.uploaded" },
async ({ event, step }) => {
const text = await step.run("parse-pdf", () => parsePdf(event.data.url));
const chunks = await step.run("chunk", () => chunk(text));
const summary = await step.run("llm-summarize", async () =>
llm.complete(`Summarize: ${chunks.join("\n")}`)
);
await step.run("store", () => db.save(summary));
}
);

Each step.run is durably checkpointed. If llm-summarize 429s and the container dies, the next attempt resumes from the LLM step — not the start.

Temporal — Python workflow:

from temporalio import workflow, activity

@activity.defn
async def call_llm(prompt: str) -> str:
return await llm.complete(prompt)

@workflow.defn
class AgentWorkflow:
@workflow.run
async def run(self, query: str) -> str:
for _ in range(10):
response = await workflow.execute_activity(
call_llm, prompt=query,
start_to_close_timeout=timedelta(seconds=60),
retry_policy=RetryPolicy(maximum_attempts=5),
)
if "done" in response:
return response
query = f"Continue: {response}"
raise ApplicationError("max steps")

When to add orchestration

  • AI flows take longer than ~30 seconds.
  • You have batch / cron / scheduled AI jobs.
  • You have human-in-the-loop approval steps.
  • Failures mid-workflow are costly (tokens already spent, partial DB writes).
  • You need to fan out across many items (embed 100k docs in parallel).

For a single-request streaming chatbot, you don't need orchestration. For an agent that researches over an hour, you need it badly.

Pricing & cost notes

ToolFree tierPaid
Inngest Cloud50k steps/mo$20+/mo, usage-based
Trigger.dev Cloudgenerous free$20+/mo
Temporal Cloudtrial creditusage; enterprise $$$$
Restate Cloudbeta creditsusage
HatchetOSS freehosted starting low
Modal$30/mo free creditsusage (compute + functions)
Step Functions / Workflowsper-transition pricingtiny for typical AI scale

Orchestration cost is almost always trivial compared to LLM token cost. The savings from "no more lost mid-flight agent runs" pay for it many times over.

Pitfalls

  • Putting orchestration on a long-running HTTP request. You're sidestepping the entire point. Move the work behind an event or a queue.
  • Not making activities idempotent. Workflows retry. If your "send email" step actually sends on every retry, your users get four copies.
  • One giant workflow function. Break into steps. Each step.run is your retry / checkpoint granularity.
  • Storing large blobs in workflow state. State gets serialized on every checkpoint. Put files in S3 and pass references.
  • No start_to_close_timeout. A hung LLM call hangs the workflow until heat-death of the universe.
  • Mixing dev and prod workflow IDs. Re-deploying a workflow definition with the same ID but different code is the classic Temporal footgun. Use versioning.
  • Orchestration for what should be a function call. A 200ms LLM call doesn't need durable execution. Don't reach for the heavy hammer.
🤔 Quick checkQuick check

→ Next: Agent runtimes