Skip to main content

Papers Worth Reading

In one line: Most AI papers don't matter for shipping AI. A short list of foundational ones gives you the conceptual vocabulary; the rest you skim only when they intersect a problem you're hitting.

In plain English

Here's a relieving secret: you can ship production AI for years without reading a single research paper. Most of the job is engineering — prompts, evals, retrieval, deployment — not research. But about ten foundational papers gave the field its shared vocabulary, and knowing them makes everything newer dramatically easier to skim. This page lists those ten, then shows you how to triage the endless rest in minutes instead of hours.

1. The honest claim

You can ship production AI for years without reading a single research paper. Most AI engineering is engineering — prompts, evals, retrieval, observability, deployment. The papers are interesting but rarely actionable.

That said, ~10 foundational papers give you the conceptual vocabulary that every newer paper references. Knowing them makes everything else easier to skim.

2. The foundational ten (read these, in this order)

Transformer architecture

  1. Attention is All You Need (Vaswani et al., 2017) — the original transformer paper. The architecture every LLM is built on.

Scale and capability

  1. Language Models are Few-Shot Learners (Brown et al., 2020 — the GPT-3 paper) — why scale alone produces emergent capabilities; the "in-context learning" concept.
  2. Scaling Laws for Neural Language Models (Kaplan et al., 2020) — how loss scales with model size, dataset size, compute.

Alignment and instruction-following

  1. Training Language Models to Follow Instructions with Human Feedback (Ouyang et al., 2022 — the InstructGPT paper) — RLHF, why "instruct"-tuned models behave like GPT-3.5 / ChatGPT.

Tools, agents, retrieval

  1. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020) — the original RAG paper.
  2. ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022) — the tool-use loop pattern.
  3. Chain-of-Thought Prompting Elicits Reasoning (Wei et al., 2022) — why "let's think step by step" works.

Modern surveys and frames

  1. A Survey of Large Language Models (Zhao et al., 2023+ updates) — a periodically-updated landscape view; skim the latest version.
  2. The Bitter Lesson (Sutton, 2019 — an essay, not a paper) — why general methods that leverage compute beat domain-specific cleverness.
  3. Building Effective Agents (Anthropic, 2024 — an engineering essay, not academic) — current best primer on agent patterns from a major lab.

Read these and you have the vocabulary to follow any newer paper. ~30 hours of reading total.

3. How to actually read a paper

You don't have to read papers like a textbook. The triage method:

Encounter a paperRead title,abstract, conclusion~5 minDoes the claimaddress a problemI'm working on?Bookmark + move onRead introduction +method headers~15 minMethodologystill relevant?Skim full paper,focus onfigures + tables +ablations~30-60 minRun the techniqueagainstmy eval setnoyesnoyes

90% of papers stop at "bookmark + move on." That's correct. The remaining 10% you actually use, you test before adopting.

4. The categories worth tracking

You don't need every paper, but knowing the categories helps:

CategoryCadenceWhy
Frontier model technical reportsPer releaseWhat's new in capability
RAG / retrieval methodsMonthlyThis area is moving fast
Agent / planning architecturesMonthlySame
Eval methodologyQuarterlySlowly improving
Prompting techniquesQuarterlyMostly diminishing returns
Long-context tricksQuarterlyWhen you hit context limits
Safety / alignmentQuarterlySlow but important
Quantization / efficient inferenceIf you self-hostOnly if relevant

5. Where to find the worthwhile ones

  • arXiv cs.CL and cs.LG — the firehose. Use it via a curated filter, not directly.
  • Papers with Code — adds the "reproducible?" signal.
  • Latent Space podcast — weekly summary by people who read more than you can.
  • The Sequence / Import AI / The Batch — newsletters with paper digests.
  • AlphaSignal — daily AI papers + a brief, decent signal-to-noise.
  • Twitter / X lists (see Part IV-1) — researchers post their own papers; aggregators retweet the important ones.

6. The lab blog posts are often better than the papers

For practical engineering, the major labs' engineering blog posts are often higher signal than their research papers:

Engineering posts tell you "here's how to use this in your app." Papers tell you "here's why this exists." For shipping, the first is usually what you need.

7. The papers worth re-reading

A few papers reward re-reading at different career stages:

  • Attention is All You Need — at year 0 (architecture overview), at year 2 (multi-head attention details), at year 4 (positional encoding choices).
  • Scaling Laws — at year 0 (the existence of the laws), at year 2 (why frontier-tier costs what it does).
  • The Bitter Lesson — annually. It's short, and the lesson is unintuitive enough that re-reading recalibrates your priors.

8. When NOT to read papers

Specifically, don't:

  • Read a paper to "stay current" if it doesn't address something you're building. The cost of context-switching to academic prose outweighs the benefit.
  • Read 12 RAG papers before building your first RAG. Build first. Read after.
  • Read a paper to refute someone on Twitter. The expected ROI is negative.

9. The "what would I cite?" test

Useful self-check: in a technical discussion with another AI engineer, would you actually cite this paper to make a point? If not, the paper wasn't worth your time. If yes, you remember it; you internalized it.

Most papers fail this test. The foundational ten pass it constantly.

10. The bibliography habit

Keep a simple ~/notes/papers-read.md — title, link, one-sentence takeaway, date.

After two years you have:

  • A scannable list of what you've read.
  • A reference when you need to cite something.
  • A growth artifact — early entries look unsophisticated; that's progress.

Common mistakes

Where people commonly trip up
  • Trying to read everything new on arXiv. The volume is unsurvivable; ~85% of papers are noise. Filter ruthlessly.
  • Reading papers without building. You "understand" RAG from a paper but have never built one. Building is the test of understanding.
  • Treating engineering blog posts as inferior. For practical AI engineering, lab blog posts often beat papers — they're written for engineers, not reviewers.
  • Skipping the foundational ten. Reading newer papers without the foundations is reading a sequel you haven't read the original of.
  • Reading to "look smart." If the only audience for your reading is yourself-pretending-to-be-impressive, skip it. Read what you'll use.
🤔 Quick checkQuick check

→ Next: Communities and conferences — where production AI engineers actually congregate.