Streaming
Streaming lets your application receive model output incrementally as it is generated instead of waiting for the full response to complete.
With SolRouter, streaming uses standard Server-Sent Events (SSE) over the same OpenAI-compatible /chat/completions endpoint. You enable it by setting:
{
"stream": true
}
This is ideal for:
- Chat interfaces that should feel responsive
- Long answers where early tokens matter
- Tool-calling workflows that benefit from partial output
- Reducing perceived latency in production UIs
Base URL
https://api.solrouter.io/ai
How streaming works
When stream: true is included in the request body, SolRouter returns a response with content type similar to:
text/event-stream
Instead of a single JSON payload, the server sends a sequence of data: events.
Typical flow:
- Your client sends
POST /chat/completions - The model starts generating output
- SolRouter forwards chunks as SSE events
- Your client appends text as chunks arrive
- A final
[DONE]event signals completion
A typical stream looks like this:
data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":"Low"},"index":0}]}
data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"content":" latency"},"index":0}]}
data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"content":" feels like"},"index":0}]}
data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"content":" magic."},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":6,"total_tokens":20,"cost":0.0000027}}
data: [DONE]
Basic streaming request
curl
curl https://api.solrouter.io/ai/chat/completions \
-H "Authorization: Bearer $SOLROUTER_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"model": "openai/gpt-4o-mini",
"stream": true,
"messages": [
{ "role": "user", "content": "Write a short haiku about speed." }
]
}'
Request body
{
"model": "openai/gpt-4o-mini",
"stream": true,
"messages": [
{
"role": "user",
"content": "Write a short haiku about speed."
}
]
}
Chunk format
Each SSE data: line contains a JSON object representing a partial completion chunk.
Common chunk fields
| Field | Type | Description |
|---|---|---|
id | string | Completion identifier |
object | string | Usually chat.completion.chunk |
created | number | Unix timestamp |
model | string | Model used for generation |
choices | array | Partial output choices |
usage | object | Token usage and cost, typically included near the end of the stream |
choices[0].delta
The delta object contains incremental output.
| Field | Type | Description |
|---|---|---|
role | string | Usually appears in the first chunk |
content | string | Partial text fragment |
tool_calls | array | Partial tool call fragments when tool calling is used |
Example first chunk
{
"id": "chatcmpl_1",
"object": "chat.completion.chunk",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant",
"content": "Low"
}
}
]
}
Example middle chunk
{
"id": "chatcmpl_1",
"object": "chat.completion.chunk",
"choices": [
{
"index": 0,
"delta": {
"content": " latency"
}
}
]
}
Example final chunk
{
"id": "chatcmpl_1",
"object": "chat.completion.chunk",
"choices": [
{
"index": 0,
"delta": {},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 6,
"total_tokens": 20,
"cost": 0.0000027
}
}
Stream terminator
data: [DONE]
Streaming in TypeScript with fetch
The browser fetch API can read the response body as a stream.
const response = await fetch("https://api.solrouter.io/ai/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.SOLROUTER_API_KEY}`,
"Content-Type": "application/json",
"Accept": "text/event-stream",
},
body: JSON.stringify({
model: "openai/gpt-4o-mini",
stream: true,
messages: [
{ role: "user", content: "Explain streaming in one paragraph." },
],
}),
});
if (!response.ok || !response.body) {
throw new Error(`Request failed: ${response.status}`);
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let fullText = "";
let usage: unknown = null;
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const parts = buffer.split("\n\n");
buffer = parts.pop() ?? "";
for (const part of parts) {
const line = part.trim();
if (!line.startsWith("data: ")) continue;
const payload = line.slice(6);
if (payload === "[DONE]") {
console.log("Stream complete");
continue;
}
const chunk = JSON.parse(payload);
const delta = chunk.choices?.[0]?.delta?.content ?? "";
if (delta) {
fullText += delta;
process.stdout.write(delta);
}
if (chunk.usage) {
usage = chunk.usage;
}
}
}
console.log("\nFinal text:", fullText);
console.log("Usage:", usage);
Streaming in Node.js with the OpenAI SDK
Many OpenAI-compatible SDKs support streaming directly when pointed at SolRouter.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.solrouter.io/ai",
apiKey: process.env.SOLROUTER_API_KEY,
});
const stream = await client.chat.completions.create({
model: "openai/gpt-4o-mini",
stream: true,
messages: [
{ role: "user", content: "Write a short poem about APIs." },
],
});
for await (const chunk of stream) {
const text = chunk.choices?.[0]?.delta?.content ?? "";
process.stdout.write(text);
}
If your SDK exposes usage in the final chunk, you can inspect it there.
Streaming in Python
Using httpx
import json
import os
import httpx
url = "https://api.solrouter.io/ai/chat/completions"
headers = {
"Authorization": f"Bearer {os.environ['SOLROUTER_API_KEY']}",
"Content-Type": "application/json",
"Accept": "text/event-stream",
}
payload = {
"model": "openai/gpt-4o-mini",
"stream": True,
"messages": [
{"role": "user", "content": "Explain what SSE streaming is."}
],
}
full_text = ""
usage = None
with httpx.stream("POST", url, headers=headers, json=payload, timeout=60.0) as response:
response.raise_for_status()
for line in response.iter_lines():
if not line or not line.startswith("data: "):
continue
data = line[6:]
if data == "[DONE]":
break
chunk = json.loads(data)
delta = chunk.get("choices", [{}])[0].get("delta", {}).get("content", "")
if delta:
full_text += delta
print(delta, end="", flush=True)
if "usage" in chunk:
usage = chunk["usage"]
print("\n")
print("Final text:", full_text)
print("Usage:", usage)
Using the OpenAI Python SDK
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=os.environ["SOLROUTER_API_KEY"],
)
stream = client.chat.completions.create(
model="openai/gpt-4o-mini",
stream=True,
messages=[
{"role": "user", "content": "Write a slogan for a fast AI API."}
],
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
Reconstructing the final message
Streaming sends fragments, so your client usually needs to reconstruct the full output.
Simple concatenation
For plain text responses, concatenate every delta.content fragment in order:
let fullText = "";
for await (const chunk of stream) {
const part = chunk.choices?.[0]?.delta?.content ?? "";
fullText += part;
}
Why reconstruction matters
You often need the final assembled text for:
- Saving chat history
- Database storage
- Analytics
- Post-processing
- Markdown rendering
- Structured parsing after generation completes
Streaming with tool calling
When tool calling is enabled, streamed chunks may include partial tool_calls data instead of plain text.
Example request
{
"model": "openai/gpt-4o-mini",
"stream": true,
"messages": [
{
"role": "user",
"content": "What's the weather in Tokyo?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Returns weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": { "type": "string" }
},
"required": ["city"]
}
}
}
]
}
Example streamed tool chunks
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_1","type":"function","function":{"name":"get_weather","arguments":""}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"city\":\"To"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"kyo\"}"}}]}}]}
data: {"choices":[{"delta":{},"finish_reason":"tool_calls"}]}
Reconstructing tool arguments
Tool arguments may arrive in pieces and need to be concatenated by tool_calls[index].
const toolCalls: Record<number, { id?: string; name?: string; arguments: string }> = {};
for await (const chunk of stream) {
const calls = chunk.choices?.[0]?.delta?.tool_calls ?? [];
for (const call of calls) {
const index = call.index ?? 0;
if (!toolCalls[index]) {
toolCalls[index] = { arguments: "" };
}
if (call.id) {
toolCalls[index].id = call.id;
}
if (call.function?.name) {
toolCalls[index].name = call.function.name;
}
if (call.function?.arguments) {
toolCalls[index].arguments += call.function.arguments;
}
}
}
After the stream ends, parse the reconstructed argument string as JSON.
Streaming in React
A typical React pattern is to append incoming chunks into component state.
import { useState } from "react";
export function useStreamingChat() {
const [text, setText] = useState("");
const [loading, setLoading] = useState(false);
async function run(prompt: string) {
setLoading(true);
setText("");
const response = await fetch("https://api.solrouter.io/ai/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.NEXT_PUBLIC_SOLROUTER_API_KEY}`,
"Content-Type": "application/json",
"Accept": "text/event-stream",
},
body: JSON.stringify({
model: "openai/gpt-4o-mini",
stream: true,
messages: [{ role: "user", content: prompt }],
}),
});
if (!response.body) {
setLoading(false);
throw new Error("Missing response body");
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const events = buffer.split("\n\n");
buffer = events.pop() ?? "";
for (const event of events) {
const line = event.trim();
if (!line.startsWith("data: ")) continue;
const payload = line.slice(6);
if (payload === "[DONE]") continue;
const chunk = JSON.parse(payload);
const delta = chunk.choices?.[0]?.delta?.content ?? "";
if (delta) {
setText((prev) => prev + delta);
}
}
}
setLoading(false);
}
return { text, loading, run };
}
In production, do not expose a private server-side API key directly to the browser. Prefer a server-side route handler or backend proxy for client applications.
Error handling
Streaming requests can fail in two ways:
- The HTTP request fails before streaming begins
- The stream starts, then ends early or sends an error payload
Check the initial HTTP status
if (!response.ok) {
const text = await response.text();
throw new Error(`Request failed: ${response.status} ${text}`);
}
Common failure scenarios
| Problem | What it means | What to do |
|---|---|---|
401 Unauthorized | Missing or invalid API key | Verify Authorization: Bearer sr_... |
402 Payment Required | No balance for paid model | Top up credits or use a free model |
404 Not Found | Invalid path or unknown model | Check endpoint and model ID |
429 Too Many Requests | Rate limit reached | Retry with backoff |
| Early connection close | Upstream provider interruption | Retry idempotent requests |
| Malformed chunk parsing | Buffer split incorrectly | Use proper SSE parsing logic |
Retry strategy
For network failures and 429 / 5xx responses, use exponential backoff.
async function streamWithRetry(makeRequest: () => Promise<Response>, retries = 3) {
let attempt = 0;
while (true) {
try {
const response = await makeRequest();
if (response.ok) {
return response;
}
if (response.status < 500 && response.status !== 429) {
throw new Error(`Non-retryable error: ${response.status}`);
}
throw new Error(`Retryable error: ${response.status}`);
} catch (error) {
attempt += 1;
if (attempt > retries) {
throw error;
}
const delayMs = 500 * 2 ** (attempt - 1);
await new Promise((resolve) => setTimeout(resolve, delayMs));
}
}
}
Best practices
1. Always parse SSE incrementally
Do not assume each network read contains a full JSON object. Buffer partial data and split on SSE event boundaries.
2. Store the final assembled output
Even if you render partial text live, keep a reconstructed final string for persistence and analytics.
3. Capture usage from the final chunk
Usage data is typically only available near the end of the stream. Save it separately from the text buffer.
4. Handle tool calls as structured state
Tool arguments may arrive in fragments. Reconstruct them by call index rather than treating them like plain text.
5. Use streaming for UX, not just speed
Streaming often does not reduce total compute time, but it dramatically improves perceived responsiveness.
6. Keep prompts concise
Long prompts increase time-to-first-token and may reduce the UX benefit of streaming.
7. Prefer server-side streaming in production apps
For web apps, proxy streaming through your backend or route handlers when possible. This keeps your private credentials off the client.
Minimal end-to-end example
const response = await fetch("https://api.solrouter.io/ai/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.SOLROUTER_API_KEY}`,
"Content-Type": "application/json",
"Accept": "text/event-stream",
},
body: JSON.stringify({
model: "anthropic/claude-sonnet-4",
stream: true,
messages: [
{ role: "user", content: "Give me three short tips for building a CLI." }
],
}),
});
if (!response.ok || !response.body) {
throw new Error(`Request failed: ${response.status}`);
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let result = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const events = buffer.split("\n\n");
buffer = events.pop() ?? "";
for (const event of events) {
const line = event.trim();
if (!line.startsWith("data: ")) continue;
const data = line.slice(6);
if (data === "[DONE]") continue;
const chunk = JSON.parse(data);
const delta = chunk.choices?.[0]?.delta?.content ?? "";
if (delta) {
result += delta;
process.stdout.write(delta);
}
}
}
console.log("\n\nFinal result:", result);
Next steps
- API Reference — request and response schemas across endpoints
- Tool Calling — full function execution workflow
- Structured Output — JSON mode and schema-constrained responses
- Vision & Multimodal — sending images and mixed content blocks
- Errors — retry strategy, rate limits, and failure handling