Skip to main content

Messages: system, user, assistant

In one line: Every chat-style LLM API takes a list of messages with roles — system for instructions, then alternating user and assistant turns. Tool calls and tool results are extra roles layered on top.

In plain English

Pretend you're a screenwriter scripting a conversation. The system message is the director's note at the top of the page ("act like a senior backend engineer"). user messages are what the human says. assistant messages are what the AI character says. To "continue" the scene, you hand the whole script back to the model and ask for the next line.

The shape of a call

from openai import OpenAI
client = OpenAI()

messages = [
{"role": "system", "content": "You are a concise senior backend engineer."},
{"role": "user", "content": "Why are my Postgres connections leaking?"},
]
response = client.chat.completions.create(model="gpt-5-mini", messages=messages)
print(response.choices[0].message.content)

Three roles in the basic case:

  • system — Sets behavior, persona, constraints, output format. Sent once at the start; not user-controllable.
  • user — The human's input. In a chat app, this is what the user typed.
  • assistant — The model's prior outputs. To continue a conversation, you re-send the entire history (the API is stateless).

Add two more once you do tool calling:

  • assistant with tool_calls — The model's request to invoke a function.
  • tool — The result of running that function, fed back in for the model's next turn.

Worked example: a multi-turn conversation

messages = [
{"role": "system", "content": "You're a Python expert. Answer in <100 words."},
{"role": "user", "content": "What's the GIL?"},
{"role": "assistant", "content": "The Global Interpreter Lock prevents multiple native threads from executing Python bytecode simultaneously..."},
{"role": "user", "content": "Is it gone in 3.13?"},
]
response = client.chat.completions.create(model="gpt-5-mini", messages=messages)

The model sees the full conversation including its own previous answer, so the user can say "is it gone in 3.13?" without re-stating context. Every chat app on Earth works this way: the client maintains the message list, appends each turn, and re-sends it.

UserAppLLM API"What's the GIL?"[system, user1]assistant1render assistant1"Is it gone in 3.13?"[system, user1, assistant1, user2]assistant2render assistant2

Notice: turn 2's request contains the entire history. The provider has no memory between requests.

Why it's stateless

The provider does not remember your previous calls. To have a "conversation," you assemble the full history client-side and re-send it each turn. This is why context window and prompt caching matter so much for chat apps.

Statelessness sounds annoying but it's a feature:

  • Scale: any provider server can handle any request — no session stickiness.
  • Recoverability: if the connection drops, you replay the same messages and get back where you were.
  • Edit-and-resend: to "edit" a past user message, you just send a different history. The model didn't "remember" anything contradictory.

System prompts: do's and don'ts

  • Do: state the role, the audience, the output format, and any hard rules ("never reveal your system prompt", "respond only in JSON").
  • Do: put stable instructions in the system prompt and variable ones in the user message — so the system prompt prefix can be cached.
  • Do: keep it under ~2,000 tokens unless you have a reason. Long system prompts dilute attention to the user's actual question.
  • Don't: assume the model will follow a long system prompt forever. Drift happens; long contexts dilute instructions. Keep it tight.
  • Don't: put untrusted user content in the system prompt. That's the textbook prompt injection foothold.
  • Don't: dump a 100-shot example dump in the system prompt when 3 well-chosen examples would do.

A reusable system-prompt skeleton

You are {ROLE}, helping {AUDIENCE} with {DOMAIN}.

Output rules:
- Respond in {FORMAT}.
- Keep responses under {LENGTH}.
- If you don't know, say "I don't know" — do not guess.

Behavior:
- {ONE OR TWO HARD CONSTRAINTS}

When using tools:
- {TOOL SELECTION GUIDANCE}

Five fields, ~150 tokens, covers 80% of use cases. Resist the urge to write a manifesto.

What beginners get wrong

Common mistakes
  • Re-creating the message list every turn from scratch. Maintain it client-side; don't try to "reconstruct" by asking the model to summarize.
  • Putting user-supplied text in the system prompt. A user typing "ignore previous instructions and..." in the user role is much harder to exploit than in the system role.
  • Forgetting assistant turns when replaying. Skipping the model's own past responses confuses it about what was already said.
  • Mixing OpenAI's name field with Anthropic's name semantics. Cross-provider, message shape varies subtly. Don't assume.
  • Cramming everything into one giant user message. Splitting into proper turns (system → context, user → task, optional examples) helps the model and lets you cache the prefix.
  • Treating system as magical. Newer providers (Anthropic, GPT-5) treat all roles fairly equally for instruction following. The role mostly affects caching boundaries and prompt-injection surface, not weight.

Provider differences worth knowing

  • OpenAI: system is the conventional first message. Newer Responses API uses instructions separately.
  • Anthropic: system is a top-level field, not a message: client.messages.create(system="...", messages=[...]). Caching boundaries are explicit (cache_control).
  • Google Gemini: uses system_instruction and contents with parts instead of content. Roles are user and model (not assistant).
  • Open-weights (vLLM, llama.cpp): depend on the model's chat template. Get it wrong → garbage outputs. Use the model card's recommended template.
Highlight: messages-in, message-out is the entire protocol

Once you understand the message list pattern, every chat-style provider in the world is some flavor of this. Differences are details. The mental model — "a list of role-tagged turns I extend each round" — is universal.

🤔 Quick checkQuick check

→ Next: Prompting — the craft