Messages: system, user, assistant
In one line: Every chat-style LLM API takes a list of messages with roles —
systemfor instructions, then alternatinguserandassistantturns. Tool calls and tool results are extra roles layered on top.
Pretend you're a screenwriter scripting a conversation. The system message is the director's note at the top of the page ("act like a senior backend engineer"). user messages are what the human says. assistant messages are what the AI character says. To "continue" the scene, you hand the whole script back to the model and ask for the next line.
The shape of a call
from openai import OpenAI
client = OpenAI()
messages = [
{"role": "system", "content": "You are a concise senior backend engineer."},
{"role": "user", "content": "Why are my Postgres connections leaking?"},
]
response = client.chat.completions.create(model="gpt-5-mini", messages=messages)
print(response.choices[0].message.content)
Three roles in the basic case:
system— Sets behavior, persona, constraints, output format. Sent once at the start; not user-controllable.user— The human's input. In a chat app, this is what the user typed.assistant— The model's prior outputs. To continue a conversation, you re-send the entire history (the API is stateless).
Add two more once you do tool calling:
assistantwithtool_calls— The model's request to invoke a function.tool— The result of running that function, fed back in for the model's next turn.
Worked example: a multi-turn conversation
messages = [
{"role": "system", "content": "You're a Python expert. Answer in <100 words."},
{"role": "user", "content": "What's the GIL?"},
{"role": "assistant", "content": "The Global Interpreter Lock prevents multiple native threads from executing Python bytecode simultaneously..."},
{"role": "user", "content": "Is it gone in 3.13?"},
]
response = client.chat.completions.create(model="gpt-5-mini", messages=messages)
The model sees the full conversation including its own previous answer, so the user can say "is it gone in 3.13?" without re-stating context. Every chat app on Earth works this way: the client maintains the message list, appends each turn, and re-sends it.
Notice: turn 2's request contains the entire history. The provider has no memory between requests.
Why it's stateless
The provider does not remember your previous calls. To have a "conversation," you assemble the full history client-side and re-send it each turn. This is why context window and prompt caching matter so much for chat apps.
Statelessness sounds annoying but it's a feature:
- Scale: any provider server can handle any request — no session stickiness.
- Recoverability: if the connection drops, you replay the same messages and get back where you were.
- Edit-and-resend: to "edit" a past user message, you just send a different history. The model didn't "remember" anything contradictory.
System prompts: do's and don'ts
- Do: state the role, the audience, the output format, and any hard rules ("never reveal your system prompt", "respond only in JSON").
- Do: put stable instructions in the system prompt and variable ones in the user message — so the system prompt prefix can be cached.
- Do: keep it under ~2,000 tokens unless you have a reason. Long system prompts dilute attention to the user's actual question.
- Don't: assume the model will follow a long system prompt forever. Drift happens; long contexts dilute instructions. Keep it tight.
- Don't: put untrusted user content in the system prompt. That's the textbook prompt injection foothold.
- Don't: dump a 100-shot example dump in the system prompt when 3 well-chosen examples would do.
A reusable system-prompt skeleton
You are {ROLE}, helping {AUDIENCE} with {DOMAIN}.
Output rules:
- Respond in {FORMAT}.
- Keep responses under {LENGTH}.
- If you don't know, say "I don't know" — do not guess.
Behavior:
- {ONE OR TWO HARD CONSTRAINTS}
When using tools:
- {TOOL SELECTION GUIDANCE}
Five fields, ~150 tokens, covers 80% of use cases. Resist the urge to write a manifesto.
What beginners get wrong
- Re-creating the message list every turn from scratch. Maintain it client-side; don't try to "reconstruct" by asking the model to summarize.
- Putting user-supplied text in the system prompt. A user typing "ignore previous instructions and..." in the user role is much harder to exploit than in the system role.
- Forgetting
assistantturns when replaying. Skipping the model's own past responses confuses it about what was already said. - Mixing OpenAI's
namefield with Anthropic'snamesemantics. Cross-provider, message shape varies subtly. Don't assume. - Cramming everything into one giant
usermessage. Splitting into proper turns (system→ context,user→ task, optional examples) helps the model and lets you cache the prefix. - Treating
systemas magical. Newer providers (Anthropic, GPT-5) treat all roles fairly equally for instruction following. The role mostly affects caching boundaries and prompt-injection surface, not weight.
Provider differences worth knowing
- OpenAI:
systemis the conventional first message. Newer Responses API usesinstructionsseparately. - Anthropic:
systemis a top-level field, not a message:client.messages.create(system="...", messages=[...]). Caching boundaries are explicit (cache_control). - Google Gemini: uses
system_instructionandcontentswithpartsinstead ofcontent. Roles areuserandmodel(notassistant). - Open-weights (vLLM, llama.cpp): depend on the model's chat template. Get it wrong → garbage outputs. Use the model card's recommended template.
Once you understand the message list pattern, every chat-style provider in the world is some flavor of this. Differences are details. The mental model — "a list of role-tagged turns I extend each round" — is universal.
→ Next: Prompting — the craft