A field guide to designing, building, evaluating, shipping, and operating LLM-powered applications — from your first API call to production at enterprise scale.
# An LLM only ever sees tokens — not words. import tiktoken enc = tiktoken.encoding_for_model("gpt-4o") text = "tokenization is fun" toks = enc.encode(text) print(toks) # → tokenization is fun # → [3239, 2065, 374, 2523]
Every advanced feature — chat, search, agents, multimodal — is layered on top of that single primitive. Master the primitive and the rest is assembly.
Add prompting, retrieval, and evals to what you already know. If you can build a CRUD app, you’re 70% of the way there.
Written so a beginner can follow along while still being useful to working engineers. Read in order, or jump to what you need.
Tokens, embeddings, the transformer, context windows, sampling, streaming, tool calling, RAG, and agent loops — just enough to be useful.
Read Foundations →THEME 02Every major provider, framework, and service: what it does, when to use it, why it exists, and what it replaces.
Read Tech Stack →THEME 03Solo indie builder, 20-person AI startup, and 2,000-engineer enterprise — three radically different ways to ship the same feature.
Compare workflows →THEME 04From “idea” to “shipped and measured,” plus the patterns that recur in every production LLM app.
Read Lifecycle →THEME 05The recurring “should we…” debates, each with a concrete decision rule instead of hand-waving.
Read Decisions →THEME 06What an AI engineer actually does in 2026, the specialization tracks, and how to position yourself.
Read Career →Designed so you can master one topic per page and always know what comes next.
The whole guide fans out from one idea. Twenty minutes from now you’ll know exactly why every bill, every context limit, and every latency number is measured in tokens.