Skip to main content

Tool use / function calling

In one line: You give the model a list of functions it's allowed to call (name, description, JSON-schema parameters). The model can emit "I want to call get_weather(city='SF')" instead of plain text. Your code runs the function, returns the result, and the conversation continues.

In plain English

A bare LLM is a brain in a jar — it can talk but can't do anything. Tools are the hands. You tell the model "here's a list of buttons you can press, each does X with inputs Y," and it picks the right button at the right time. Your code presses the button. That single mechanism is the foundation of every "AI agent" you've heard of.

The shape

from openai import OpenAI
client = OpenAI()

tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, e.g. 'San Francisco'"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"}
},
"required": ["city"],
},
},
}]

response = client.chat.completions.create(
model="gpt-5-mini",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
)

If the model decides to call a tool, the response contains a tool_calls field instead of plain content:

response.choices[0].message.tool_calls
# [ToolCall(id='call_abc', function=Function(name='get_weather', arguments='{"city":"Tokyo"}'))]

Your code:

  1. Parses the tool call.
  2. Executes the function (with the model-provided arguments).
  3. Sends the result back as a tool role message.
  4. Calls the model again — it now has the result and can either call another tool or produce a final answer.
UserYour appLLMWeather API"Weather in Tokyo?"messages + toolstool_call:get_weather(city='Tokyo')GET /weather?city=Tokyo{temp: 22, condition: 'sunny'}messages + tool_result"It's 22°C and sunny inTokyo."render

That single turn-by-turn loop is the foundation of every agent.

Worked example: one full tool round-trip

def get_weather(city: str, units: str = "celsius") -> dict:
# pretend this hits a real API
return {"city": city, "temp": 22, "units": units, "condition": "sunny"}

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
response = client.chat.completions.create(model="gpt-5-mini", messages=messages, tools=tools)
msg = response.choices[0].message

if msg.tool_calls:
messages.append(msg) # keep the assistant's tool-request turn
for call in msg.tool_calls:
args = json.loads(call.function.arguments)
result = get_weather(**args)
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": json.dumps(result),
})
# Second call: now the model has the tool result
final = client.chat.completions.create(model="gpt-5-mini", messages=messages, tools=tools)
print(final.choices[0].message.content)
# "It's 22°C and sunny in Tokyo right now."

Three roles, two API calls, one tool execution. Wrap that in a while loop and you have an agent (see agent loop).

Why this is a big deal

  • The LLM becomes a controller. It picks which of your functions to call, with which arguments, based on the user's request. You don't write a router; the model is the router.
  • External data and side effects are now accessible without you parsing free-text intents.
  • Composes with structured output. A tool's parameters are a JSON schema; the call is guaranteed schema-conformant.
  • Tools are how LLMs grow new abilities. Code interpreter, web search, database access, file I/O — all just tool definitions.

Patterns

  • Single-shot tool call — the model calls one tool, gets the result, answers. Most "AI features" are this.
  • Multi-tool selection — give the model 3–10 tools, let it pick. Don't go past ~30 unless you've tested it; selection accuracy degrades.
  • Parallel tool calls — most modern providers support emitting multiple tool calls in one response. Execute them concurrently for latency. See function calling deep.
  • Forced tool choicetool_choice="required" or tool_choice={"name": "X"} forces the model to call (a specific) tool. Useful for guaranteed-structured outputs.
  • Agent loop — keep calling the model with new tool results until it stops requesting tools (see The agent loop).

Designing good tools

The model picks tools based on:

  1. Tool name — short, action-y verb. search_docs, not DocumentSearchService_v2_query.
  2. Description — the most important field. Write it like a docstring for a junior dev. Explain when to use this tool and when not to.
  3. Parameter descriptions — clarify units, formats, gotchas. "city": "City name including country if ambiguous, e.g. 'Paris, France'".
  4. Enum values — use them aggressively. The model can't typo an enum.

A good tool description is the difference between 60% and 95% tool selection accuracy.

What beginners get wrong

Common mistakes
  • Vague descriptions. "Searches stuff" tells the model nothing. Write 2–3 sentences explaining when this tool wins.
  • Too many tools. Past ~20–30 tools, selection accuracy tanks. Group related actions, use a routing first-step, or use multi-agent.
  • Tools with overlapping responsibilities. Two tools that both "look up users" → the model picks randomly. Make boundaries crisp.
  • Not validating arguments. Schema constrains shape, not semantics. If email must be a real email, validate before acting.
  • Beware tool-use loops. A model that's confused can call the same tool over and over. Set a max-iteration cap.
  • Forgetting to send tool_call_id back. Each tool result must reference the call ID it's responding to, or the provider rejects it.
  • Stringifying complex tool results poorly. A 50KB JSON dump as a tool result wastes tokens. Return only what the model needs.
  • Treating tool errors as fatal. Always return errors as tool results so the model can recover ({"error": "city not found, try again"}) instead of crashing.

Tool result format that works

# Bad
{"role": "tool", "content": str(db.query(...).all())}

# Good
{"role": "tool", "tool_call_id": call.id, "content": json.dumps({
"results": [{"id": r.id, "title": r.title} for r in rows[:5]],
"total_count": total,
"truncated": total > 5,
})}

Return JSON, paginate, indicate truncation, surface errors as structured fields. The model uses what you give it.

Highlight: tools are how LLMs grew up

Pre-2023 LLMs could only emit text. Tools turned them from "fancy autocomplete" into systems that can search, compute, and act. Every agent framework, every Cursor-style coding assistant, every retrieval-augmented chat is downstream of this one primitive.

→ Going deeper: For the production discipline — tight tool sets, description craft, parallel execution, structured errors, and human confirmation on destructive actions — see Tool use done right.

🤔 Quick checkQuick check

→ Next: Function calling, deep