The model | Modern AI Guide

📄️Tokens

The unit an LLM reads and writes. Why "tokens" instead of words, and what that means for billing, prompts, and context windows.

📄️Tokenizers

BPE, SentencePiece, tiktoken — the algorithms that split your text into tokens, why the same string varies across models, and how to count tokens before you send.

📄️Embeddings

A vector of floats that captures the meaning of a piece of text. The basis for semantic search, RAG, deduplication, classification.

📄️Neural networks (the machine under every model)

A neural network from zero — neurons, weights, layers, the forward pass, and how training nudges weights to fit examples. The deep-learning foundation that every LLM, including the transformer, is built on.

📄️The transformer (just enough)

The neural network architecture behind every modern LLM. Just enough to make decisions later make sense — no calculus.

📄️Training vs. inference

Why training is rare and inference is your daily reality — and why this distinction shapes every cost, latency, and tooling decision.

📄️Quantization (smaller weights, cheaper inference)

Storing model weights at lower precision (FP8, INT8, INT4) to fit bigger models on smaller GPUs and serve them faster — the memory math, the quality trade-off, and when a quantized big model beats a small one.

📄️Reasoning models

Reasoning-effort dials, Claude extended thinking, DeepSeek V4 thinking mode, Gemini Deep Think. Models that "think" before responding — when they're worth it, when they're not, and how to prompt them differently.

📄️Model families

Frontier vs workhorse vs small. Closed vs open. Reasoning models vs base chat models. The durable map of the model landscape.

📄️Where LLMs fail (and why)

The systematic blind spots of a next-token predictor — counting, exact math, state-tracking, fresh facts, and confident wrong answers — and the first-principles reason each one happens, so you know when to distrust the model.