Tokens
The unit an LLM reads and writes. Why "tokens" instead of words, and what that means for billing, prompts, and context windows.
Tokenizers
BPE, SentencePiece, tiktoken — the algorithms that split your text into tokens, why the same string varies across models, and how to count tokens before you send.
Embeddings
A vector of floats that captures the meaning of a piece of text. The basis for semantic search, RAG, deduplication, classification.
The transformer (just enough)
The neural network architecture behind every modern LLM. Just enough to make decisions later make sense — no calculus.
Training vs. inference
Why training is rare and inference is your daily reality — and why this distinction shapes every cost, latency, and tooling decision.
Reasoning models
o1/o3, Claude extended thinking, DeepSeek R1, Gemini Deep Think. Models that "think" before responding — when they're worth it, when they're not, and how to prompt them differently.
Model families
Frontier vs workhorse vs small. Closed vs open. Reasoning models vs base chat models. The durable map of the model landscape.