Messages — system, user, assistant
The shape of every modern LLM API call — system prompt for instructions, then alternating user and assistant turns.
Prompting — the craft
Chain-of-thought, ReAct, self-consistency, prompt chaining, few-shot vs zero-shot, role assignment. The repeatable techniques behind reliable prompts.
Context windows
The hard cap on how many tokens the model can see and emit in one call. Why bigger isn't always better.
Prompt caching
Reusing the model's KV cache across calls when the prompt prefix is identical. 5-10x cost savings, dramatically faster TTFT.
Sampling — temperature, top_p, top_k
How the next token is picked from the model's probability distribution. The knobs that make outputs more deterministic or more creative.
Streaming
Sending tokens to the client as they're generated, instead of waiting for the full response. Required UX for any chat-style feature.
Structured output
Forcing the model to return JSON, or even better, JSON that conforms to a schema. The bridge between LLM text and traditional code.
Tool use / function calling
Letting the model emit a structured call (function name + args) that your code then executes. The foundation of every agent.
Function calling, deep
Parallel tools, forced tool choice, streaming partial JSON, structured output via tools. The patterns that turn basic tool use into production agents.
MCP — the Model Context Protocol
The 2024-released open protocol for connecting LLM clients to tools, data, and prompts. The standard that ate function-calling glue in 2025-2026.
Multimodal inputs
Vision (image URLs and base64), audio (Whisper-class STT, Realtime), and document inputs. What changes, what costs, and where it shines.