Function calling and tools: connecting models to actions

Function calling lets a model decide to use your code — without ever running it. Here is what actually happens, and where it goes wrong.

tools2026-05-12 12:05 KST·Lead Editor·7 min read

On its own, a language model can only produce text. It cannot look up today's weather, query your database, send an email, or do arithmetic reliably. Function calling — also called tool use — is the mechanism that bridges that gap, letting a model reach beyond text into real actions and live data. It is the foundation of nearly every AI agent and the single most important capability for building applications that do things rather than just talk about them. The mechanics surprise people, though, because the model never actually runs your code. This explainer walks through what really happens, and where it goes wrong.

The gap function calling fills

A language model is, at its core, a text generator. That makes it excellent at language and unreliable at everything that requires current information or real-world effect. Ask it for the weather and it will produce plausible-sounding text that is not connected to any thermometer. Ask it to do precise arithmetic and it may guess. Ask it to fetch a customer's order and it has no way to look.

Function calling solves this by giving the model access to tools — functions you define and control — that can do these things: hit an API, run a calculation, query a database, trigger an action. The model's job is to decide when a tool is needed and with what inputs. Your code's job is to actually run it. That division of labor is the whole idea, and getting it straight is the key to understanding everything that follows.

What actually happens (the part that surprises people)

The most common misconception is that the model executes functions. It does not. The model only ever produces text — including, when appropriate, a structured request that says it wants a tool run. Your application does the running. The flow goes like this:

You describe the available tools to the model — their names, what they do, and the inputs they expect.
The user asks something. The model decides whether answering well requires a tool.
If so, the model returns a structured message: "call this tool with these arguments." It does not run anything.
Your code parses that request, runs the actual function, and gets a result.
You send the result back to the model.
The model uses that result to compose its final answer.

The model is the decision-maker; your application is the hands. Keeping that boundary clear is essential, because everything about security and reliability depends on remembering that you control execution, not the model.

Describing tools so the model uses them well

A tool is only as useful as its description, because the model decides whether and how to use it based entirely on what you tell it. A tool needs a clear name, an explanation of what it does and when to use it, and a specification of its inputs — what arguments it takes, their types, and which are required.

The quality of these descriptions directly determines behavior. A vaguely described tool gets used at the wrong times or with malformed arguments; a clearly described one gets used appropriately. It helps to think of the description as instructions to a capable assistant who can see only what you wrote — if a human could not tell from your description when to use the tool, neither can the model. The provider documentation from Anthropic and OpenAI specifies the exact format for declaring tools, and following it precisely is what lets the model return well-formed calls your code can act on.

Why this unlocks agents

Function calling is the building block beneath the entire idea of an AI agent. An agent is, roughly, a loop: the model is given a goal and a set of tools, it decides on an action, your code executes it, the result feeds back, and the model decides the next action — repeating until the task is done.

Each turn of that loop is a function call. The model surveys its tools, picks one, you run it, and the outcome shapes the next decision. This is how a system can break a vague request like "find the cheapest flight and book it" into a sequence of concrete tool uses — search, compare, reserve — none of which the model performs itself, all of which it orchestrates. Once you see that an agent is function calling in a loop, the seemingly magical behavior of agentic systems becomes far less mysterious: it is the same six-step flow, run repeatedly toward a goal.

Where it goes wrong

Function calling introduces failure modes that text-only generation does not, and anticipating them is most of building a robust system.

The model picks the wrong tool or wrong arguments. It decides based on your descriptions, which means poor descriptions produce poor decisions. Vague or overlapping tools are a frequent culprit.
Tools fail. The API is down, the query errors, the input is invalid. Your code must handle tool failures and decide what to report back to the model, which can often recover gracefully if told what went wrong.
The model invents arguments. Asked to call a tool without all the information it needs, a model may fill in plausible but wrong values. Validate arguments before acting on them; never trust them blindly.
Loops run away. In an agent setting, a model can get stuck calling tools without converging. Guardrails on the number of steps keep a confused agent from running forever.

These are not exotic edge cases — they are the normal texture of building with tools, and a production system has to handle every one of them.

The security boundary you cannot skip

Because tools take real actions, function calling is also a security surface, and it is the one place where carelessness is most expensive. The governing principle is that the model's request to call a tool is untrusted input. It is shaped by the user's prompt, and a user can try to steer the model into calling tools in ways you did not intend.

That means you never give the model a tool you would not let an untrusted user trigger with arbitrary arguments. Validate every argument before acting. Scope what each tool can do as narrowly as the job allows. Apply the same authorization and rate limits you would apply to any other path into your systems. A tool that deletes records or spends money demands far more caution than one that reads a public data feed. The model decides; your code must verify before it acts.

The takeaway

Function calling is what turns a text generator into a system that acts — it lets the model decide which of your tools to use and with what inputs, while your code does the actual running. That boundary is the whole point: the model is the decision-maker, your application is the hands, and you remain in control of execution. Describe your tools clearly enough that a capable assistant could use them from the description alone, handle the inevitable wrong picks and tool failures, and treat every tool call as untrusted input that you validate before acting. Get that right and function calling becomes the foundation of agents and of any application that does real work — built on one simple six-step flow you can reason about from end to end.

#function-calling#tools#agents#integration

Primary sources

Anthropic documentation OpenAI API documentation