Reduce hallucinations: a practical checklist

Models invent facts when the task invites them to. This checklist covers the moves that cut hallucinations without pretending you can eliminate them.

tutorials2026-05-03 10:46 KST·Lead Editor·7 min read

A hallucination is a confident statement that isn't true. Language models produce them because they are built to generate plausible continuations, and a plausible-sounding falsehood is, from the model's point of view, a perfectly good continuation. You cannot prompt this tendency away entirely. You can, however, reshape the task so the model has less reason to fabricate and more reason to admit uncertainty. This is a checklist of the moves that actually move the needle.

Give the model the facts instead of asking it to recall them

The biggest single reduction in hallucinations comes from not relying on the model's memory at all. When you ask a model to answer from what it happened to absorb in training, you are asking it to recall, and recall is exactly where fabrication lives. When you instead provide the relevant source material in the prompt and ask the model to answer from that, you change the task from "remember this" to "read this and report." Reading is far more reliable than remembering.

This is the core idea behind retrieval-augmented setups, but you don't need infrastructure to benefit. Even pasting the relevant document into the prompt for a one-off question dramatically reduces invention, because the answer is now in front of the model rather than reconstructed from fuzzy memory. Whenever the facts exist somewhere you can fetch them, fetch them and put them in the context.

Tell the model it is allowed to say "I don't know"

Models fabricate partly because nothing told them not to. Faced with a question they can't answer from the available information, the default behavior is to produce a confident guess, because a guess is a more probable continuation than silence. The fix is to explicitly authorize the non-answer: "If the provided context does not contain the answer, say that it is not stated rather than guessing."

This one instruction changes behavior more than its simplicity suggests. It gives the model a sanctioned escape hatch, so the safe move is no longer to invent. Make the instruction specific to your situation — "not in the document," "not enough information," "out of scope" — so the model knows which kind of non-answer to give. Without this, you are implicitly asking for a guess every time the real answer is unavailable.

Constrain the answer to the source

Providing source material helps; restricting the answer to it helps more. There is a difference between "here is a document, answer the question" and "answer the question using only the information in this document; do not add outside knowledge." The second framing tells the model that the document is the boundary, not just a hint. Anything the model wants to say that isn't supported by the source is, by instruction, off limits.

Pair this with a request to ground claims in the source — to point at the part of the document each statement comes from. The act of attributing a claim forces the model to check whether the claim is actually present, and statements that can't be attributed are exactly the ones most likely to be invented. Grounding is both a quality improvement and a detection mechanism.

Ask for reasoning before the answer on hard questions

For questions that require connecting several facts or steps, demanding an immediate answer invites the model to commit before it has worked anything out — and a premature commitment is fertile ground for fabrication. Asking the model to reason through the question first, then state its conclusion, gives it room to notice when the pieces don't actually support an answer.

The benefit is twofold. The reasoning often produces a better answer, because the model can catch its own gaps mid-stream. And the reasoning is inspectable: when you read the steps, an unsupported leap is visible in a way that a bare conclusion is not. For simple lookups, skip this — the overhead isn't worth it. For anything requiring synthesis, it both reduces and exposes invention.

Lower the stakes of each request

Long, sprawling tasks invite more hallucination than short, scoped ones, because a model juggling many sub-questions in one pass has more opportunities to wander off the supported path. Breaking a complex request into smaller, well-defined pieces — each with its own clear inputs and a clear notion of what a correct answer looks like — keeps the model on a shorter leash for each step.

Smaller tasks are also easier to verify. When a request produces one focused claim, you can check that claim against its source. When it produces a ten-paragraph essay weaving dozens of claims together, verification becomes impractical and errors slip through. Scoping is partly about reducing fabrication and partly about making the fabrication that remains catchable.

Verify the output rather than trusting it

No amount of prompting makes a model's output trustworthy by default, so the final item on the checklist is verification. For anything that matters, the answer is a draft to be checked, not a fact to be published. The check can be human review, but it can also be automated: confirm that cited sources actually contain the claimed information, that numbers add up, that referenced items exist.

Design the system so verification is possible. Outputs that point at their sources can be checked against those sources. Outputs that state where a fact came from can be traced. Outputs that are confident, unattributed prose are unverifiable by construction, and unverifiable is where undetected hallucinations live. The goal is not a model that never errs — it is a pipeline where the errors that occur are the kind you can catch before they reach anyone.

The takeaway

You reduce hallucinations by changing the task, not by scolding the model. Give it the facts so it reads instead of recalls. Authorize it to say it doesn't know. Restrict the answer to the source and ask it to ground its claims. Let it reason before concluding on hard questions, scope requests small enough to verify, and treat every important output as a draft to check rather than a fact to trust. None of these eliminates hallucination — nothing does — but together they turn a system that fabricates confidently into one that mostly admits its limits and exposes the rest for checking.

#hallucinations#reliability#grounding#evaluation

Primary sources

Anthropic — prompt engineering overview OpenAI — prompt engineering guide