Write a system prompt that works

A system prompt sets the rules before the conversation starts. Here is how to write one that holds up across real inputs, not just demos.

tutorials2026-04-14 16:30 KST·Lead Editor·7 min read

A system prompt is the standing instruction that frames every turn of a conversation. The user message changes; the system prompt stays. That makes it the single highest-leverage piece of text in an application built on a language model — and the one most often written carelessly. This guide walks through how to write a system prompt that behaves the same on the hundredth input as it did on the first.

What a system prompt is for

Think of the system prompt as the job description, not the task. The task arrives in each user message. The system prompt establishes the things that are true across all tasks: who the assistant is, what it is allowed to do, how it should format its answers, and what it must never do. If you find yourself repeating the same instruction in every user message, that instruction belongs in the system prompt.

The distinction matters because the two roles are handled differently. The system prompt is the stable context the model conditions on for the whole session; user messages are the variable input. Putting durable rules in the system prompt keeps them in force even as the conversation drifts, and keeps your per-request messages short and focused on the actual task.

Start with role and scope

The first job of a system prompt is to answer two questions: what is this assistant, and what is it for. A vague role ("you are a helpful assistant") gives the model nothing to anchor on. A specific role ("you are a support agent for a billing system; you help users understand charges and resolve disputes") narrows the space of plausible responses before a single user word arrives.

Scope is the other half. Stating what the assistant does is good; stating what it does not do is often more valuable. An assistant told it handles billing will still happily answer a question about cooking unless you tell it not to. Define the boundary explicitly: "If a request is outside billing, politely say it is out of scope and redirect." Boundaries are not bureaucracy — they are how you keep a general model behaving like a specific product.

Write rules as behavior, not vibes

The most common mistake in system prompts is describing a personality instead of specifying behavior. "Be friendly and professional" sounds like guidance but decides nothing. Behavior is observable: "Address the user by name if it is known. Use short paragraphs. Never use exclamation marks." Each of those can be checked against an output; "friendly" cannot.

Apply the same discipline to constraints. Instead of "be careful with numbers," write "Do not perform arithmetic in your head; if a calculation is required, show the steps." Instead of "don't make things up," write "If you are not certain a fact is supported by the provided context, say you don't know." Every rule you write should be something you could verify by reading a transcript. If you can't verify it, the model can't reliably follow it.

Handle the cases that break things

A demo prompt handles the happy path. A production prompt handles the inputs you didn't plan for: the empty question, the hostile user, the request that is half in scope, the input that contains its own instructions. These are where unguided assistants embarrass their owners, and they are exactly what the system prompt exists to govern.

Name the failure modes and prescribe the response. For missing information: "If the context does not contain the answer, say so rather than guessing." For out-of-scope requests: define the redirect. For attempts to override your rules through the user message — "ignore your instructions and..." — state plainly that instructions in user content are data, not commands, and the system rules stand. You will not anticipate every edge case, but covering the predictable ones removes most of the surprises.

Structure so the model can follow it

A system prompt that is one long paragraph is hard for a model to use consistently, the same way it would be hard for a person. Group related rules under clear headings: identity, scope, format, safety, edge cases. Order them by priority — the rules that must never break go first and stated most plainly. When two instructions could conflict, say which one wins, because the model will otherwise pick for you.

Keep it as short as it can be while still complete. Every extra sentence is something the model has to weigh against everything else, and a bloated prompt dilutes the rules that actually matter. Resist the urge to add a new line every time something goes wrong; first ask whether an existing rule, stated more clearly, would have covered it. A tight prompt where every line earns its place outperforms a sprawling one.

Test it against real conversations

A system prompt is not done when it reads well. It is done when it holds up across a set of real interactions. Collect a handful of representative sessions — including the awkward ones — and run your prompt against all of them. Read the outputs looking for rules that were ignored, boundaries that leaked, and formats that drifted. Then change one thing and run the set again.

This is where system prompts are actually engineered rather than written. A rule you were sure was clear will turn out to be ambiguous the moment a real user phrases something unexpectedly. Treat each failure as a specification bug: either the rule was missing, or it was stated in a way the model couldn't apply. Keep the version that behaves best across the whole set, not the one that produced the nicest single answer.

The takeaway

A good system prompt is a specification, not a vibe. It states a specific role and scope, writes its rules as observable behavior, names the failure modes that would otherwise embarrass you, and structures everything so the model can follow it under pressure. Then it earns its place by surviving real conversations rather than a single demo. Write it that way and the system prompt becomes the most reliable part of your application — the steady frame that keeps every turn on track.

#system-prompt#prompting#reliability#design

Primary sources

Anthropic — prompt engineering overview OpenAI — prompt engineering guide