Chain-of-thought: why reasoning steps help
Asking a model to "think step by step" makes it noticeably better at hard problems. That is strange if you think about it. Here is why it works.
One of the most useful tricks for getting better answers out of a language model is almost embarrassingly simple: ask it to work through the problem step by step before giving its final answer. This is called chain-of-thought, and on hard problems it can make a striking difference. What is strange is that it works at all. The model already "knows" whatever it knows; why should asking it to narrate its reasoning change the quality of its conclusions? The answer reveals something fundamental about how these models compute.
The short version: a model produces its output one piece at a time, with a fixed, limited amount of computation per piece, and chain-of-thought gives it more pieces — and therefore more computation — to reach a hard answer. The steps are not decoration. They are the working space.
The problem with answering all at once
Picture asking someone a multi-step arithmetic or logic question and demanding the final answer instantly, with no chance to work anything out. For an easy question that is fine. For a hard one it is brutal — you are forcing all the intermediate reasoning to happen invisibly and at once, with no room to lay anything out.
A language model faces a version of this constraint. When it generates the next piece of text, it does a fixed amount of computation and commits to an output. If a problem requires several dependent steps of reasoning, demanding the answer immediately forces all of those steps to be compressed into that single burst of computation. For genuinely hard problems, there is simply not enough room in one step to do the work. The model is being asked to do multi-step reasoning in a single shot, and it stumbles for the same reason a person would.
How writing the steps changes the situation
Chain-of-thought removes that bottleneck in a clever way. When the model writes out its reasoning step by step, each step it produces becomes part of the text it then reads to produce the next step. The intermediate results do not have to be held in some hidden, fixed-size scratchpad. They get written down, and the written-down version is available to build on.
So instead of compressing a five-step problem into one burst of computation, the model spreads it across five (or more) bursts, each one able to read the results of the previous ones. Step one establishes a fact; step two uses that fact and adds another; and so on, until the final answer rests on a chain of intermediate results that were each computed with their own share of effort. The model is, in effect, giving itself more computation by giving itself more text to compute over.
The generated reasoning is not just an explanation produced after the fact. It is the medium in which the computation actually happens. Take it away and you take away the working room.
Why more text means more computation
This is the crux, and it is worth stating carefully. A model spends a roughly fixed amount of computation per piece of output it generates. The total computation it can bring to bear on a problem is therefore tied to how much text it produces along the way.
A one-word answer gets one unit of that computation. A long, worked-through solution gets many. By writing out its reasoning, the model is not merely showing its work — it is buying itself more total computation to reach the conclusion. Each intermediate step is another chunk of processing applied to the problem, and the written record of earlier steps lets later steps stand on them rather than redo them. This is why chain-of-thought helps most on exactly the problems that need several dependent steps, and barely matters on problems an immediate answer already handles. Easy questions do not need the extra room; hard ones do.
Why the steps have to be written, not just thought
A natural question: if the model has internal computation anyway, why does it need to externalize the steps as text? Why not reason silently and just emit the answer? The reason comes back to the fixed-per-step limit. The model's hidden internal processing for a single output is bounded. It cannot, within one step, run an arbitrarily long chain of reasoning internally.
Writing the steps out is how the model escapes that per-step bound. Each written step resets the budget — the next step gets its own fresh allotment of computation, and it can read everything written so far. The text is the mechanism that lets short, bounded bursts of computation be chained into something longer. Without externalizing, there is no chaining; the model is stuck doing everything within the limits of a single step. The page, so to speak, is what makes extended reasoning possible.
What chain-of-thought does not guarantee
It is important not to over-romanticize this. The reasoning a model writes out is not a guaranteed-faithful window into how it reached its answer. A model can produce a plausible-looking chain of steps that does not actually correspond to the computation that drove its conclusion, and it can reach a wrong answer through reasoning that sounds perfectly coherent. The visible steps are generated by the same fallible process as everything else the model writes.
This means chain-of-thought improves performance without making the output trustworthy by default. A confident, well-structured line of reasoning can still contain a wrong step, and the final answer inherits the error while sounding rigorous. Chain-of-thought gives the model more room to compute, which raises the ceiling on what it can solve. It does not install correctness or honesty. The reasoning is working space, not proof.
When to reach for it, and when not to
Knowing the mechanism tells you when chain-of-thought is worth the cost. It shines on problems with several dependent steps — multi-step math, logic puzzles, careful analysis, anything where the answer is built rather than recalled. On these, giving the model room to lay out intermediate results genuinely raises its success rate.
It is wasteful, though, on simple lookups or single-step questions, where the extra steps add length and cost without improving the answer. And because the written reasoning consumes computation and output, it is not free — more text means more time and more expense. The skill is matching the tool to the problem: spend the extra working room where the problem actually needs it, and skip it where an immediate answer already suffices.
The takeaway
Chain-of-thought works because a model gets a fixed amount of computation per piece of text it produces, and writing out its reasoning gives it more pieces, and therefore more total computation, to reach a hard answer. The steps are not an explanation tacked on afterward; they are the working space where the computation happens, and externalizing them is what lets short bursts of processing chain into extended reasoning. It raises the ceiling on what a model can solve — but it does not guarantee the steps are faithful or the answer correct. Used where problems genuinely have multiple dependent steps, it is one of the highest-leverage ways to get more out of a model.
