Structured output: getting reliable JSON from models

When your code needs data, not prose, the model has to return clean, parseable structure. Here is how to get reliable JSON instead of hope.

tools2026-05-21 08:19 KST·Lead Editor·7 min read

A language model that writes a nice paragraph is useful to a human. A language model that feeds another program needs to do something harder: return data in a shape your code can parse, every time, without surprises. The moment you stop reading the output yourself and start passing it to json.parse, prose becomes a liability and structure becomes a requirement. This guide is about closing the gap between "the model usually returns something JSON-ish" and "my pipeline can depend on the model's output" — because in production, "usually" is the same as "broken on a schedule."

Why prose is not enough

When a model answers a person, small imperfections vanish into human flexibility. A reader does not care if the answer opens with "Sure, here's that:" or wraps the data in a code fence or labels the fields in slightly different words than last time. A parser cares about all of it. A single stray sentence before the data, a trailing comma, a field that is sometimes a number and sometimes the word "unknown" — any one of these turns a working pipeline into a stack trace.

The core problem is that a model is trained to produce plausible text, and plausible text is not the same as valid structured data. Left to its own devices, a model will drift toward being helpful and conversational exactly when you need it to be rigid and machine-readable. Getting reliable structured output is the work of removing that drift. There are several levers, and they stack.

Lever one: ask precisely

The cheapest improvement is also the most often skipped: tell the model exactly what shape you want, and show it. Vague instructions produce vague structure. A request to "return the details as JSON" leaves the model to invent field names, nesting, and types, and it will invent them differently across calls.

Instead, specify the schema concretely. Name every field, state its type, say whether it is required, and — this is the part people omit — show a complete example of a valid response. Models are extraordinarily good at pattern-matching to an example, and a single well-formed sample does more to pin down the format than a paragraph of description. State the rules that matter explicitly too: that the output must be only the JSON with no surrounding text, that a missing value should be represented a specific way rather than omitted or guessed, and that the field set is fixed. Precision in the request is the foundation everything else builds on. Skip it and the later levers are patching a problem you created.

Lever two: use the model's structured-output features

Asking nicely helps, but instructions alone leave room for the model to wander. Most serious LLM providers now offer features specifically for structured output, and using them is a large step up from prompting alone.

These features come in a couple of flavors. The lighter form is a mode that constrains the output to valid JSON — the model is prevented from emitting anything that is not syntactically well-formed, which eliminates the entire category of "it added a sentence" and "it used a code fence" failures. The stronger form lets you supply a schema the output must conform to, so the result is not just valid JSON but valid JSON of the exact shape you asked for, with the right fields and types.

Where these features exist, prefer them over hand-rolled prompting. They move the guarantee from "the model was asked to" toward "the system enforces it," and that shift is the whole game. Check your provider's documentation for what is available and how to invoke it, because the specifics differ, but the principle is constant: let the platform enforce structure rather than relying on the model's goodwill.

Lever three: validate before you trust

Even with the best prompting and the strongest structured-output feature, treat the model's output as untrusted until you have checked it. This is not paranoia; it is the same discipline you would apply to any external input. Validation has two layers, and you want both.

Structural validation confirms the output parses and matches the schema: the right fields are present, types are correct, required values are not missing. This catches the malformed responses that slip past everything upstream.
Semantic validation confirms the content makes sense for your domain: a date is a real date, a category is one of your allowed values, a quantity is in a plausible range, a referenced identifier actually exists. A response can be perfectly valid JSON and still be nonsense, and only domain checks catch that.

Run validation as a gate, not an afterthought. Output that fails the gate should never reach the rest of your system as if it were fine. What you do at the gate is the subject of the next lever.

Lever four: handle the failures you will still get

No combination of the above is perfect, so design for the residual failures instead of pretending they will not happen. When validation fails, you have a few sane options, roughly in order of preference.

The first is a bounded retry. Many structured-output failures are one-off, and simply asking again — ideally telling the model what was wrong with its previous attempt — succeeds. Bound the retries so a persistent failure does not loop forever. The second, for minor and predictable issues, is repair: trimming a stray code fence, fixing obvious formatting, coercing a near-miss into the expected shape. Keep repair narrow and conservative, because aggressive auto-fixing hides real problems and can corrupt data. The third, when retries and repair are exhausted, is a clean, logged failure — route the case to a fallback path or human review rather than passing bad data downstream, and log it so you can see patterns. A field that fails validation constantly is telling you to fix your prompt or schema, not to add another repair rule.

Keep the schema as simple as the job allows

A quieter lever, worth its own mention: the complexity of what you ask for directly affects how reliably you get it. Deeply nested objects, long lists of optional fields, and elaborate conditional structures are all harder for a model to produce consistently than a flat, small, required-fields-only shape. Before reaching for clever prompting to wrangle a baroque schema, ask whether the schema needs to be that baroque. Often you can split one complicated extraction into two simple ones, or flatten a structure that was nested for tidiness rather than necessity. The most reliable structured output is the structure you did not overcomplicate.

The takeaway

Reliable JSON from a model is a stack of reinforcing habits, not a single trick. Ask precisely and show an example. Use your provider's structured-output features so the platform enforces the shape rather than the model merely intending it. Validate every response — structurally and semantically — and treat it as untrusted until it passes. Design for the failures that remain with bounded retries, narrow repair, and clean fallbacks. And keep the schema no more complex than the job requires. Do these together and you cross the line that matters in production: from a model that usually returns parseable data to a pipeline that can actually depend on it.

#structured-output#json#schema#validation

Primary sources

OpenAI API documentation Anthropic documentation