Retrieval & memory | Modern AI Engineer Guide

📄️Vector search

Finding the K most semantically similar pieces of text by comparing embedding vectors. The "find nearest neighbors in 1,536-dimensional space" primitive.

📄️Hybrid search

BM25 (keyword) plus vector (semantic) search, blended. Each catches what the other misses. The 2026 production default.

📄️Chunking strategies

Fixed-token vs semantic vs layout-aware vs hierarchical. Overlap, units, and why chunking dominates RAG quality more than any other knob.

📄️Reranking

Cross-encoder rerankers (Cohere Rerank, BGE, voyage-rerank). The 'cheap retrieval -> expensive rerank' pattern that wins production RAG.

📄️RAG basics

Retrieval-Augmented Generation — handing the model relevant documents at query time so it can answer from real data instead of guessing.

📄️Memory

Giving an LLM continuity across conversations — short-term, long-term, episodic, and the patterns that actually work in production.