Pretraining vs fine-tuning vs alignment

Three words get blurred together when people describe how models are made. They are different stages with different jobs. Here is what each one does.

research2026-04-08 17:04 KST·Lead Editor·7 min read

When people describe how a modern language model is made, three words get used almost interchangeably: pretraining, fine-tuning, and alignment. They are not the same thing. They are three distinct stages, done in order, each with a different goal, a different kind of data, and a different effect on the final model. Confusing them leads to confused expectations about what a model can do and why it behaves the way it does.

Here is the shape of it: pretraining gives a model raw knowledge and fluency, fine-tuning teaches it to do specific tasks in a useful form, and alignment shapes how it behaves and what it refuses. Knowledge, skill, behavior — three different things, built in three passes.

Pretraining: learning the world from text

Pretraining is the first and by far the largest stage. The model is shown an enormous quantity of text and given one deceptively simple job: predict what comes next. Over and over, across an unimaginable amount of material, it guesses the next piece of text and adjusts itself when it is wrong.

This sounds trivial, but to predict the next word well across everything humans have written, a model has to absorb a staggering amount along the way: grammar, facts, styles, reasoning patterns, the structure of arguments, the rhythm of dialogue. Next-word prediction is the task, but broad competence is the byproduct. This is where the model gets its fluency and the bulk of what it knows.

The key thing about pretraining is that it is unfocused on purpose. The model is not learning to be helpful, or to answer questions, or to follow instructions. It is learning to continue text of every kind. A freshly pretrained model is enormously knowledgeable and almost unusable — ask it a question and it might continue with more questions, because that is a plausible continuation of the text it saw. It has the raw material of intelligence but none of the manners.

Fine-tuning: turning knowledge into usable skill

Fine-tuning takes the broadly capable but unfocused pretrained model and teaches it to behave in a particular, useful way. Instead of oceans of undifferentiated text, the model is shown a smaller, curated set of examples that demonstrate the desired behavior: questions paired with good answers, instructions paired with correct responses, tasks paired with the right form of output.

The model already knows a great deal from pretraining. Fine-tuning does not teach it new facts so much as teach it how to put what it knows to work. It learns that when text looks like a question, the expected continuation is an answer; that an instruction should be followed rather than echoed; that a request for a summary calls for a summary. The raw capability was already there. Fine-tuning channels it into a usable shape.

This stage is far smaller and cheaper than pretraining because it is steering an existing model rather than building one. It is also where most task-specific adaptation happens. The same pretrained foundation can be fine-tuned in different directions — toward a helpful assistant, toward a coding tool, toward a narrow specialist — without redoing the expensive first stage.

Alignment: shaping behavior, values, and judgment

Alignment is about how the model behaves once it is capable and useful. A fine-tuned model can answer questions and follow instructions, but that is not enough. We also want it to be honest about what it does not know, to decline harmful requests, to avoid confidently inventing facts, and to respond in a tone that is genuinely helpful rather than merely plausible. Alignment is the stage that works on these qualities.

The defining challenge of alignment is that the desired behavior is hard to specify with examples alone. It is easy to write a correct answer to a math question; it is much harder to write out, example by example, exactly how a model should handle an ambiguous, sensitive, or adversarial request. So alignment often relies on a different signal: human judgments about which of two responses is better, used to teach the model a general sense of preferred behavior rather than a fixed answer key.

The result is a model whose dispositions — its helpfulness, its caution, its honesty about uncertainty — have been shaped, not just its skills. Alignment is why a well-made assistant declines to help with something dangerous, admits when it is unsure, and stays on task instead of drifting. It is the difference between a model that can do things and a model you would actually trust to do them.

Why the order matters

These stages are not interchangeable, and they have to happen in this sequence. You cannot align a model that does not yet have skills, and you cannot give skills to a model that has no knowledge. Each stage builds on the layer beneath it.

Pretraining supplies the raw substrate of knowledge and fluency. Fine-tuning, working on top of that, shapes it into a tool that does useful things. Alignment, working on top of that, governs how the tool behaves in the open-ended, messy situations real use throws at it. Skip the foundation and the later stages have nothing to work with; skip the later stages and you have raw capability with no manners or judgment.

Why so much knowledge comes from the first stage

A common misconception is that a model learns facts during fine-tuning or alignment. Mostly it does not. The overwhelming majority of what a model knows is laid down during pretraining, when it sees the vast bulk of its data. The later stages are comparatively tiny and are about behavior and form, not about loading in new information.

This has a practical consequence. If a model is missing knowledge — about recent events, or about your private documents — fine-tuning and alignment are usually not the fix, because they are not where knowledge enters. The better tools are giving the model the information at the time you ask, or, at large expense, redoing parts of the knowledge-heavy first stage. Understanding which stage does what tells you which lever to reach for.

What none of these stages guarantee

It is worth being clear about the limits of all three. None of them makes a model infallible. Pretraining can absorb errors and biases present in its data. Fine-tuning can make a model fluent at a task without making it correct. Alignment can shape behavior in the cases it was trained on while leaving gaps in cases it was not. The stages reduce problems; they do not eliminate them.

In particular, a well-aligned, well-fine-tuned model can still produce confident, fluent, and wrong answers, because fluency and truth are different things and only the first is directly trained. The three-stage process is how today's most capable and well-behaved models are built, but it is a process of shaping tendencies, not installing guarantees. Treating its output as reliable-by-construction is the mistake the process itself cannot prevent.

The takeaway

Pretraining, fine-tuning, and alignment are three stages with three jobs: knowledge, skill, and behavior. Pretraining floods the model with text until it is fluent and knowledgeable but unfocused. Fine-tuning shapes that raw capability into useful task behavior. Alignment governs how the model conducts itself in the open-ended situations real use demands. They build on each other in order, and most of what a model knows arrives in the first stage. Keep the three distinct, and a lot of confusion about what models can and cannot do dissolves.

#pretraining#fine-tuning#alignment#training

Primary sources

Hugging Face — Transformers and training documentation Anthropic — documentation