Conversations & System Prompts


System prompts

A system prompt is a special message with role: "system" placed at the beginning of the messages array. It gives the model persistent instructions that shape its personality, tone, output format, or area of expertise.

System messages are supported by all major models. If a model does not natively support a system role, SolRouter automatically converts it to a leading user message.

Basic system prompt

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.solrouter.io/v1",
  apiKey: process.env.SOLROUTER_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [
    {
      role: "system",
      content:
        "You are a concise technical assistant. Always respond in plain text. " +
        "Keep answers under 3 sentences. Do not use bullet points or markdown.",
    },
    {
      role: "user",
      content: "What is a closure in JavaScript?",
    },
  ],
});

Practical example — customer support bot

const SYSTEM_PROMPT = `
You are Aria, a friendly and knowledgeable support assistant for Acme Corp.
Your job is to help users with questions about their orders, account settings, and product features.

Guidelines:
- Always greet users by name if they provide it.
- If you do not know the answer, say so clearly and offer to escalate to a human agent.
- Never make up order numbers, dates, or policy details.
- Keep responses short and scannable. Use bullet points for lists of steps.
- Do not discuss topics unrelated to Acme Corp products.
`.trim();

const completion = await client.chat.completions.create({
  model: "anthropic/claude-haiku-3-5",
  messages: [
    { role: "system", content: SYSTEM_PROMPT },
    { role: "user", content: "Hi! My order hasn't arrived yet. Order #AX-99812." },
  ],
});

Tips for writing effective system prompts

TipExample
Assign a persona"You are a senior data engineer specializing in SQL."
Specify output format"Always return valid JSON. Never include prose outside the JSON object."
Set length constraints"Limit all responses to 100 words or fewer."
Define what NOT to do"Do not speculate. If uncertain, say 'I don't know'."
Provide context/dataInject user preferences, product info, or relevant documents into the system prompt at request time.

Conversation history

LLMs are stateless — they have no memory of previous requests. To build a multi-turn conversation, you must send the full history of messages on every request.

The pattern is simple:

  1. Start with your system prompt and the user's first message.
  2. After each response, append the assistant's reply to your local messages array.
  3. When the user sends a new message, append it and send the whole array again.

TypeScript example

import OpenAI from "openai";
import * as readline from "readline";

const client = new OpenAI({
  baseURL: "https://api.solrouter.io/v1",
  apiKey: process.env.SOLROUTER_API_KEY,
});

type Message = {
  role: "system" | "user" | "assistant";
  content: string;
};

const messages: Message[] = [
  {
    role: "system",
    content: "You are a helpful assistant. Be concise and friendly.",
  },
];

async function chat(userInput: string): Promise<string> {
  messages.push({ role: "user", content: userInput });

  const completion = await client.chat.completions.create({
    model: "openai/gpt-4o-mini",
    messages,
  });

  const assistantMessage = completion.choices[0].message.content ?? "";
  messages.push({ role: "assistant", content: assistantMessage });

  return assistantMessage;
}

Managing context length

Every model has a maximum context window — the total number of tokens it can process in a single request. As a conversation grows, it will eventually exceed this limit.

StrategyHow it worksTrade-off
Sliding windowKeep only the last N messages, always preserving the system prompt.Simple but the model loses early context.
SummarizationWhen the window fills up, ask the model to summarize the conversation so far.Retains key facts but adds latency and cost.
Selective pruningRemove messages unlikely to be relevant (e.g. stale tool call results).Requires domain-specific logic.
Larger context modelUse a model with a bigger context window.Higher cost per request.

Sliding window example

const MAX_HISTORY_MESSAGES = 20;

function trimHistory(messages: Message[]): Message[] {
  const [systemPrompt, ...rest] = messages;
  const trimmed = rest.slice(-MAX_HISTORY_MESSAGES);
  return [systemPrompt, ...trimmed];
}

const completion = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: trimHistory(messages),
});

Next steps

  • Models — browse context lengths and choose the right model
  • Streaming — stream conversation replies token by token
  • Tool Calling — let the model call functions in your app