The model | Modern AI Engineer Guide

📄️Tokens

The unit an LLM reads and writes. Why "tokens" instead of words, and what that means for billing, prompts, and context windows.

📄️Tokenizers

BPE, SentencePiece, tiktoken — the algorithms that split your text into tokens, why the same string varies across models, and how to count tokens before you send.

📄️Embeddings

A vector of floats that captures the meaning of a piece of text. The basis for semantic search, RAG, deduplication, classification.

📄️The transformer (just enough)

The neural network architecture behind every modern LLM. Just enough to make decisions later make sense — no calculus.

📄️Training vs. inference

Why training is rare and inference is your daily reality — and why this distinction shapes every cost, latency, and tooling decision.

📄️Reasoning models

o1/o3, Claude extended thinking, DeepSeek R1, Gemini Deep Think. Models that "think" before responding — when they're worth it, when they're not, and how to prompt them differently.

📄️Model families

Frontier vs workhorse vs small. Closed vs open. Reasoning models vs base chat models. The durable map of the model landscape.