Privacy and LLMs: what leaves your machine

When you type into an LLM, where does that text actually go — and what happens to it after? A plain-language guide to the data trail.

policy2026-06-14 17:56 KST·Lead Editor·7 min read

Every time you paste a document into a chatbot or wire an LLM into an app, you are making a privacy decision — usually without realizing it. The text you send does not vanish after the answer comes back. It travels somewhere, gets processed by someone's systems, and may be stored, logged, or reused depending on terms you probably did not read. This piece explains, in plain language, what actually leaves your machine when you use an LLM, and how to reason about it.

The basic data trail

Start with the simplest case: a hosted chatbot. When you type a prompt and hit send, that text leaves your device, crosses the network, and arrives at the provider's servers, where the model runs. The response makes the return trip. So the first thing to internalize is that with any cloud-based model, your input leaves your machine by design — that is how it works at all. The model is not on your laptop; your words go to it.

This matters because people treat a chat box like a private notepad. It is not. It is more like sending a letter to a company that opens it, processes it, and decides what to do with the contents according to its own policies. The interface feels personal and local; the reality is a round trip to someone else's infrastructure.

Three things that can happen to your input

Once your text reaches the provider, three broad outcomes are possible, and they are not mutually exclusive:

Processing. At minimum, the input is processed to generate a response. This is unavoidable and usually transient.
Logging and retention. The provider may store your inputs and outputs — for debugging, abuse detection, support, or legal compliance. Retention periods vary widely and are set by policy, not by you.
Reuse for improvement. Some providers may use submitted content to improve their systems, unless you opt out or are on terms that forbid it. This is the outcome people most often worry about, and the one most controllable through settings and account type.

The durable lesson is that these are policy choices, not laws of nature. Two providers handling identical text can do completely different things with it. The only way to know is to check the terms and settings for the specific service and account you are using.

Consumer vs business terms are different worlds

One of the most important distinctions is between consumer products and business or developer offerings. Free consumer tools often have the most permissive data terms, because the implicit trade is your data for the free service. Paid business tiers and API access frequently come with stricter commitments: shorter retention, no training on your content by default, and contractual data-handling terms.

So the same brand can offer very different privacy postures depending on which door you walk through. If you are handling anything sensitive, the question is not "do I trust this company?" but "which specific product and plan am I on, and what does that tier promise in writing?" Sensitive work belongs on terms that match its sensitivity.

The special danger: data you should never have sent

The thorniest privacy problems with LLMs are not exotic — they come from ordinary people pasting things they should not. Customer records, employee data, unreleased financials, secrets, source code, health details, someone else's personal information. Once that text leaves your machine, you cannot recall it, and you may have violated a contract, a regulation, or someone's trust regardless of what the provider does next.

The principle to hold onto: treat anything you put into a hosted model as potentially leaving your control. Before you paste, ask whether you would be comfortable handing this exact text to an outside vendor, because functionally that is what you are doing. For regulated or confidential data, that question often answers itself.

When the model runs locally

There is one configuration where the trail is genuinely different: running a model on your own hardware. With a local model, the inference happens on your machine, so your input does not leave it to be processed elsewhere. For privacy-sensitive work, this is the strongest structural guarantee, because you are not relying on a provider's promises — the data simply does not go anywhere.

The trade-offs are real: local models are often smaller and less capable than the largest hosted ones, and you take on the work of running and securing them. But the privacy story is clean. If "what leaves your machine" must be "nothing," local inference is the honest way to get there. Self-hosting in your own cloud environment sits in between — your data stays within infrastructure you control, but you own the security of it.

The third parties behind the provider

Even when you trust the provider you signed up with, your data may touch more hands than the brand name suggests. Many AI services run on cloud infrastructure they do not own, route requests through intermediaries, or rely on subprocessors for parts of the pipeline. Your text does not necessarily stay inside one company; it can move through a chain of vendors, each operating under its own arrangements.

This is not inherently sinister — almost all modern software works this way — but it matters for reasoning about privacy. The promise you are relying on is only as strong as the weakest link in that chain, and the contractual terms a serious provider offers usually account for their subprocessors. The principle for sensitive work is to prefer providers who are transparent about who else handles your data and who commit, in writing, to passing their obligations down the chain. Opacity about subprocessors is itself a signal worth noticing.

Inputs, outputs, and metadata

When people picture LLM privacy, they think about the prompt. But the full footprint is wider. The output can be sensitive too — a model's response may restate or infer things about the people in your input. And around both sits metadata: who made the request, when, from where, how often. That surrounding data can be revealing even when the content itself is mundane.

The takeaway is to think in terms of the whole interaction, not just the words you typed. A system that carefully protects prompts but logs detailed metadata, or stores rich outputs without the same care, has only solved half the problem. Privacy is a property of the entire data flow — input, output, and the trail of metadata that documents it — so the protections you apply should cover all three rather than just the part that feels obviously confidential.

Building privacy into an LLM app

If you are putting an LLM inside a product, the privacy question becomes a design responsibility, not just a personal habit:

Minimize what you send. Strip or mask data the model does not need. The safest data is the data you never transmit.
Choose terms deliberately. Use plans and providers whose data commitments match your obligations, and keep the agreements on file.
Be transparent with users. Tell people when their input goes to a third-party model and what happens to it. Surprise is the enemy of trust.
Guard the logs. Your own logs of prompts and responses are now sensitive data too. Secure and retain them with the same care as any user data.
Plan for deletion. Know how to honor a deletion request across both the provider and your own systems before someone asks.

The takeaway

The privacy of an LLM comes down to a simple chain: your text leaves your machine, a provider processes it, and policy — not the chat interface — decides what happens next. Hosted models always involve that round trip; what differs is retention, reuse, and the terms of the specific product and plan you are on. The biggest risks come from sending data you never should have, because you cannot take it back. Reason about it deliberately: minimize what leaves, match your terms to your sensitivity, run locally when the data must not travel, and treat every paste as handing text to an outside party. Privacy with LLMs is not magic — it is knowing where your words go.

#privacy#llms#data#security

Primary sources

NIST — Privacy Framework Hugging Face — documentation