Skip to main content

Case study: Perplexity

In one line: Perplexity is an "answer engine" — you ask a question, it searches the live web, retrieves a handful of pages, and answers using only those pages with inline citations — and the architecture is essentially a productized RAG pipeline tuned for freshness, citation faithfulness, and follow-up coherence.

In plain English

Perplexity is a search engine that gives you a written answer with footnotes instead of ten blue links — it searches the live web, picks a handful of trustworthy pages, and writes an answer that cites them. Under the hood it's the most famous production example of RAG: retrieve first, generate second, and verify the citations before showing the user anything. Study this page because the citation-verification pattern and the query-routing pattern transfer directly to almost any product that answers questions from a body of content.

The product

A search engine where the result is an answer, not a list of links. Three main modes:

  • Quick search — single-pass answer to a typical query.
  • Pro search — multi-step research: clarify, decompose, retrieve, synthesize.
  • Deep research — multi-minute background job producing a long structured report.

Distinct from Google: the result is synthesized; distinct from ChatGPT: the answer is grounded in retrievable, currently-live sources.

Architecture

User queryClassifierquick / pro /shopping / etc.RetrievalWeb searchBing / proprietaryindexNews indexAcademic /specialized indexesPages fetched+ parsedRerank + select5-10 top sourcesSnippet selectionper sourceModelgenerate answer w/citationsCitation verifiereach claim → citedsourceStreamed answerw/ inline sourcepillsFollow-upsuggestions

The whole pipeline runs in seconds for quick search, tens of seconds for Pro, minutes for Deep Research.

Key engineering decisions

1. Owned retrieval, not just API'd Bing

Early Perplexity used Bing's search API; later they built proprietary indexes on top, plus dedicated indexes for specific domains (news, scholarly, finance). The reason: search-quality is the primary lever on answer quality. They couldn't afford to be downstream of someone else's ranking.

This is the "your moat is the retrieval, not the model" lesson — true for every RAG product, and very true here.

2. Citation enforcement at the model layer

The model is instructed (and likely fine-tuned) to emit citations inline: [1], [2], with a structured mapping back to retrieved sources. The render layer turns those into clickable source pills.

The non-obvious engineering: the model can't just claim a citation. There's a verification step that checks each claim against the cited source. Claims with weak grounding can be flagged or the answer can be retried.

This is the cardinal RAG quality lever. See RAG basics and Safety & privacy.

3. Multiple model tiers per query type

Quick search uses a fast model (latency budget: <3s end-to-end including retrieval). Pro and Deep Research use frontier-tier reasoning models. The router decides which.

For Deep Research specifically, the agent loop runs for minutes — decomposing the query, retrieving across multiple sub-queries, building an outline, drafting, refining. This is closer to a long-running agent than a chat call.

4. Source diversity and trust tiers

Not every web page is equal. Internal classifiers tier sources (high-trust: established publishers, scholarly, primary sources; low-trust: spam, content farms, AI-generated SEO sludge). The retrieval ranker weights these.

This is downstream of the second-order problem: as the web fills with AI-generated content, generic web retrieval gets noisier. Tiered source quality is a defense.

5. Follow-up coherence

The follow-up suggestions ("ask about X next") and the multi-turn conversation flow are designed so the user can drill down without losing context. The conversation state carries enough context that "What about Q2?" works after a question about Q1 earnings.

Stack snapshot (2026)

  • Models: mix of OpenAI, Anthropic, and open-source models (Llama-family) fine-tuned for citation behavior; Perplexity has trained their own ("Sonar") for some queries.
  • Retrieval: proprietary web index + Bing API + specialty indexes (news, academic).
  • Infrastructure: AWS-heavy historically; significant inference cost optimization (custom serving for Sonar models).
  • Frontend: Next.js + custom rendering of the source-pill format.

What to copy

  • Citation pills + verifier. The single biggest UX win in a RAG product. Users see the grounding; they trust the answer.
  • Tier your queries. Not every question deserves Deep Research; not every question can be answered in a one-shot. Route by intent.
  • Owned retrieval where you can. If your domain is narrow enough, building your own index beats using a generic API.
  • Source quality tiers. "We trust this site more" is a real signal worth encoding.
  • Multi-turn coherence as a deliberate design. Most chat products fail on the third turn. Test follow-ups explicitly.

What to avoid

  • One model for all query types. Quick search and Deep Research have wildly different requirements.
  • Letting the model cite without verification. Models will cite confidently and wrongly. Always verify.
  • Generic web retrieval in a content-farm world. Filter or your answers degrade as the open web gets noisier.
  • Ignoring the "is this a question we should answer at all" filter. Some queries (medical, legal, financial advice) need different handling than "what's the population of Lisbon."
What people get wrong when copying this
  • Copying the inline-citation UI without the verification step behind it. Citation pills over unverified claims make hallucinations look more trustworthy — worse than no citations at all.
  • Sending every query through the heaviest pipeline because Deep Research demos well. Most queries need the fast path; the intent router is what keeps cost and latency survivable.
  • Treating retrieval as a commodity API call. Perplexity's lesson is that answer quality is mostly search quality — you can't fix bad retrieval downstream with a better model.
  • Ignoring source-quality tiers, so the product quietly degrades as AI-generated content floods the open web.
→ Going deeper

Perplexity lives or dies on citation faithfulness — verifying that an answer is actually grounded in its sources is the core problem of Chapter 5: Evaluation & Measurement, especially LLM-as-judge for scoring faithfulness at scale.

Sources

  • Perplexity engineering blog posts on Sonar models and retrieval.
  • Aravind Srinivas (CEO) interviews on Lex Fridman, Lenny's Podcast.
  • AI Engineer Summit talks by Perplexity engineers.
  • Public discussions of citation-enforcement and reranking.
🤔 Quick checkQuick check

→ Next: Sierra