Using the API | Modern AI Engineer Guide

📄️Messages — system, user, assistant

The shape of every modern LLM API call — system prompt for instructions, then alternating user and assistant turns.

📄️Prompting — the craft

Chain-of-thought, ReAct, self-consistency, prompt chaining, few-shot vs zero-shot, role assignment. The repeatable techniques behind reliable prompts.

📄️Context windows

The hard cap on how many tokens the model can see and emit in one call. Why bigger isn't always better.

📄️Prompt caching

Reusing the model's KV cache across calls when the prompt prefix is identical. 5-10x cost savings, dramatically faster TTFT.

📄️Sampling — temperature, top_p, top_k

How the next token is picked from the model's probability distribution. The knobs that make outputs more deterministic or more creative.

📄️Streaming

Sending tokens to the client as they're generated, instead of waiting for the full response. Required UX for any chat-style feature.

📄️Structured output

Forcing the model to return JSON, or even better, JSON that conforms to a schema. The bridge between LLM text and traditional code.

📄️Tool use / function calling

Letting the model emit a structured call (function name + args) that your code then executes. The foundation of every agent.

📄️Function calling, deep

Parallel tools, forced tool choice, streaming partial JSON, structured output via tools. The patterns that turn basic tool use into production agents.

📄️MCP — the Model Context Protocol

The 2024-released open protocol for connecting LLM clients to tools, data, and prompts. The standard that ate function-calling glue in 2025-2026.

📄️Multimodal inputs

Vision (image URLs and base64), audio (Whisper-class STT, Realtime), and document inputs. What changes, what costs, and where it shines.