Retrieval Beats Hallucination
Learn why LLMs hallucinate, and how retrieval plus llm.txt files ground answers in reality.
We'll cover the following...
At the Staff+ level, you’re responsible for building AI systems that are reliable, scalable, and safe to depend on.
One of the most common—and most dangerous—failure modes in AI systems? Hallucination.
Even the most state-of-the-art models hallucinate. It’s simply baked into how they are trained. During training, they learn that saying “I don’t know” rarely earns a reward, so instead, they get very good at guessing in a confident way.
Hallucinations usually fall into two categories:
Extrinsic hallucinations: Fabrications about the outside world (e.g., a refund policy that never existed).
Intrinsic hallucinations: Slips on mechanical tasks where the model produces a fluent but incorrect answer (e.g., counting the “b”s in “blueberry”).
The fix for both is the same: retrieval-augmented generation (RAG)—and it’s your best line of defense.
The power of RAG
RAG lets a model look up facts from a curated knowledge base
Instead of asking the model to “remember,” you supply authoritative knowledge at runtime. That flips it from storyteller to fact-checker.
Let’s say you ask GPT, “What’s MCP?”
Without retrieval, the model might answer: “MCP was introduced by Google in 2019 as part of TensorFlow Extended.”
This is polished, but completely wrong. MCP isn’t tied to Google or TensorFlow, and there was never a “2019 release.”
With retrieval, the model answers: “According to the MCP docs, MCP is an open-source standard that lets AI applications like Claude or ChatGPT connect to external systems such as databases, files, and tools. The docs describe it as a ‘USB-C for AI,’ but they don’t mention a release date or a single company owner.”
Much better.
Same model, different behavior—the only differentiating factor is retrieval.
A simple retrieval workflow
’Staying in tune with the MCP example, applying RAG would mean not chatting with the model when exploring a new technology.
Instead, you’d create a lightweight briefing file:
Generate a file (
llm.txt) from authoritative sources (docs, specs, APIs). Think of it as the model’s briefing packet.Feed it into your model (or index it with your retrieval layer).
In prompts, explicitly say: “Answer only using the provided CONTEXT.”
Tools like Firecrawl’s create-llmstxt-py automate the scraping and summarizing.
Setup requires two API keys:
Firecrawl API key
OpenAI API key
And setup is simple:
git clone https://github.com/firecrawl/create-llmstxt-pycd create-llmstxt-pypip install -r requirements.txtcp .env.example .env # then add your API keyspython generate-llmstxt.py https://example.com --max-urls 50
This outputs llm.txt (briefing) and llm-full.txt (detailed version), which you can feed into your LLM of choice.
There are plenty of other RAG tools out there. A quick search will show many commercial options, but they’re often paid and pricy. Firecrawl's create-llmstxt-py is a great place to start.
Building a RAG pipeline
If you want to push beyond John’s LLM.txt workflow, the next level is building full RAG pipelines yourself.
A RAG pipeline is the production-grade version of the simple workflow we did:
Docs are ingested, chunked, indexed, retrieved, re-ranked, cited, evaluated, and refreshed on schedule.
It’s engineered with guardrails (abstains, schema outputs, eval metrics, observability).
With a local RAG setup, you can feed your models authoritative knowledge, deploy systems into production, and stop treating AI as a novelty tool.
Get a guided start on building RAG pipelines with our hands-on courses:
👉 Fundamentals of Retrieval-Augmented Generation with LangChain: Learn to apply LangChain to implement RAG pipeline and build a frontend app for your pipeline with Streamlit.
👉 Advanced RAG Techniques: Choosing the Right Approach: Go deeper with different RAG approaches, post-retrieval optimization methods, and designing RAG-based chatbots.