Streaming


Streaming lets your application receive model output incrementally as it is generated instead of waiting for the full response to complete.

With SolRouter, streaming uses standard Server-Sent Events (SSE) over the same OpenAI-compatible /chat/completions endpoint. You enable it by setting:

{
  "stream": true
}

This is ideal for:

  • Chat interfaces that should feel responsive
  • Long answers where early tokens matter
  • Tool-calling workflows that benefit from partial output
  • Reducing perceived latency in production UIs

Base URL

https://api.solrouter.io/ai

How streaming works

When stream: true is included in the request body, SolRouter returns a response with content type similar to:

text/event-stream

Instead of a single JSON payload, the server sends a sequence of data: events.

Typical flow:

  1. Your client sends POST /chat/completions
  2. The model starts generating output
  3. SolRouter forwards chunks as SSE events
  4. Your client appends text as chunks arrive
  5. A final [DONE] event signals completion

A typical stream looks like this:

data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":"Low"},"index":0}]}

data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"content":" latency"},"index":0}]}

data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"content":" feels like"},"index":0}]}

data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"content":" magic."},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":6,"total_tokens":20,"cost":0.0000027}}

data: [DONE]

Basic streaming request

curl

curl https://api.solrouter.io/ai/chat/completions \
  -H "Authorization: Bearer $SOLROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a short haiku about speed." }
    ]
  }'

Request body

{
  "model": "openai/gpt-4o-mini",
  "stream": true,
  "messages": [
    {
      "role": "user",
      "content": "Write a short haiku about speed."
    }
  ]
}

Chunk format

Each SSE data: line contains a JSON object representing a partial completion chunk.

Common chunk fields

FieldTypeDescription
idstringCompletion identifier
objectstringUsually chat.completion.chunk
creatednumberUnix timestamp
modelstringModel used for generation
choicesarrayPartial output choices
usageobjectToken usage and cost, typically included near the end of the stream

choices[0].delta

The delta object contains incremental output.

FieldTypeDescription
rolestringUsually appears in the first chunk
contentstringPartial text fragment
tool_callsarrayPartial tool call fragments when tool calling is used

Example first chunk

{
  "id": "chatcmpl_1",
  "object": "chat.completion.chunk",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "Low"
      }
    }
  ]
}

Example middle chunk

{
  "id": "chatcmpl_1",
  "object": "chat.completion.chunk",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": " latency"
      }
    }
  ]
}

Example final chunk

{
  "id": "chatcmpl_1",
  "object": "chat.completion.chunk",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 6,
    "total_tokens": 20,
    "cost": 0.0000027
  }
}

Stream terminator

data: [DONE]

Streaming in TypeScript with fetch

The browser fetch API can read the response body as a stream.

const response = await fetch("https://api.solrouter.io/ai/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SOLROUTER_API_KEY}`,
    "Content-Type": "application/json",
    "Accept": "text/event-stream",
  },
  body: JSON.stringify({
    model: "openai/gpt-4o-mini",
    stream: true,
    messages: [
      { role: "user", content: "Explain streaming in one paragraph." },
    ],
  }),
});

if (!response.ok || !response.body) {
  throw new Error(`Request failed: ${response.status}`);
}

const reader = response.body.getReader();
const decoder = new TextDecoder();

let buffer = "";
let fullText = "";
let usage: unknown = null;

while (true) {
  const { value, done } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });

  const parts = buffer.split("\n\n");
  buffer = parts.pop() ?? "";

  for (const part of parts) {
    const line = part.trim();
    if (!line.startsWith("data: ")) continue;

    const payload = line.slice(6);

    if (payload === "[DONE]") {
      console.log("Stream complete");
      continue;
    }

    const chunk = JSON.parse(payload);
    const delta = chunk.choices?.[0]?.delta?.content ?? "";

    if (delta) {
      fullText += delta;
      process.stdout.write(delta);
    }

    if (chunk.usage) {
      usage = chunk.usage;
    }
  }
}

console.log("\nFinal text:", fullText);
console.log("Usage:", usage);

Streaming in Node.js with the OpenAI SDK

Many OpenAI-compatible SDKs support streaming directly when pointed at SolRouter.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.solrouter.io/ai",
  apiKey: process.env.SOLROUTER_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  stream: true,
  messages: [
    { role: "user", content: "Write a short poem about APIs." },
  ],
});

for await (const chunk of stream) {
  const text = chunk.choices?.[0]?.delta?.content ?? "";
  process.stdout.write(text);
}

If your SDK exposes usage in the final chunk, you can inspect it there.


Streaming in Python

Using httpx

import json
import os
import httpx

url = "https://api.solrouter.io/ai/chat/completions"

headers = {
    "Authorization": f"Bearer {os.environ['SOLROUTER_API_KEY']}",
    "Content-Type": "application/json",
    "Accept": "text/event-stream",
}

payload = {
    "model": "openai/gpt-4o-mini",
    "stream": True,
    "messages": [
        {"role": "user", "content": "Explain what SSE streaming is."}
    ],
}

full_text = ""
usage = None

with httpx.stream("POST", url, headers=headers, json=payload, timeout=60.0) as response:
    response.raise_for_status()

    for line in response.iter_lines():
        if not line or not line.startswith("data: "):
            continue

        data = line[6:]

        if data == "[DONE]":
            break

        chunk = json.loads(data)
        delta = chunk.get("choices", [{}])[0].get("delta", {}).get("content", "")

        if delta:
            full_text += delta
            print(delta, end="", flush=True)

        if "usage" in chunk:
            usage = chunk["usage"]

print("\n")
print("Final text:", full_text)
print("Usage:", usage)

Using the OpenAI Python SDK

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.solrouter.io/ai",
    api_key=os.environ["SOLROUTER_API_KEY"],
)

stream = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    stream=True,
    messages=[
        {"role": "user", "content": "Write a slogan for a fast AI API."}
    ],
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Reconstructing the final message

Streaming sends fragments, so your client usually needs to reconstruct the full output.

Simple concatenation

For plain text responses, concatenate every delta.content fragment in order:

let fullText = "";

for await (const chunk of stream) {
  const part = chunk.choices?.[0]?.delta?.content ?? "";
  fullText += part;
}

Why reconstruction matters

You often need the final assembled text for:

  • Saving chat history
  • Database storage
  • Analytics
  • Post-processing
  • Markdown rendering
  • Structured parsing after generation completes

Streaming with tool calling

When tool calling is enabled, streamed chunks may include partial tool_calls data instead of plain text.

Example request

{
  "model": "openai/gpt-4o-mini",
  "stream": true,
  "messages": [
    {
      "role": "user",
      "content": "What's the weather in Tokyo?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Returns weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": { "type": "string" }
          },
          "required": ["city"]
        }
      }
    }
  ]
}

Example streamed tool chunks

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_1","type":"function","function":{"name":"get_weather","arguments":""}}]}}]}

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"city\":\"To"}}]}}]}

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"kyo\"}"}}]}}]}

data: {"choices":[{"delta":{},"finish_reason":"tool_calls"}]}

Reconstructing tool arguments

Tool arguments may arrive in pieces and need to be concatenated by tool_calls[index].

const toolCalls: Record<number, { id?: string; name?: string; arguments: string }> = {};

for await (const chunk of stream) {
  const calls = chunk.choices?.[0]?.delta?.tool_calls ?? [];

  for (const call of calls) {
    const index = call.index ?? 0;

    if (!toolCalls[index]) {
      toolCalls[index] = { arguments: "" };
    }

    if (call.id) {
      toolCalls[index].id = call.id;
    }

    if (call.function?.name) {
      toolCalls[index].name = call.function.name;
    }

    if (call.function?.arguments) {
      toolCalls[index].arguments += call.function.arguments;
    }
  }
}

After the stream ends, parse the reconstructed argument string as JSON.


Streaming in React

A typical React pattern is to append incoming chunks into component state.

import { useState } from "react";

export function useStreamingChat() {
  const [text, setText] = useState("");
  const [loading, setLoading] = useState(false);

  async function run(prompt: string) {
    setLoading(true);
    setText("");

    const response = await fetch("https://api.solrouter.io/ai/chat/completions", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${process.env.NEXT_PUBLIC_SOLROUTER_API_KEY}`,
        "Content-Type": "application/json",
        "Accept": "text/event-stream",
      },
      body: JSON.stringify({
        model: "openai/gpt-4o-mini",
        stream: true,
        messages: [{ role: "user", content: prompt }],
      }),
    });

    if (!response.body) {
      setLoading(false);
      throw new Error("Missing response body");
    }

    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let buffer = "";

    while (true) {
      const { value, done } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const events = buffer.split("\n\n");
      buffer = events.pop() ?? "";

      for (const event of events) {
        const line = event.trim();
        if (!line.startsWith("data: ")) continue;

        const payload = line.slice(6);
        if (payload === "[DONE]") continue;

        const chunk = JSON.parse(payload);
        const delta = chunk.choices?.[0]?.delta?.content ?? "";

        if (delta) {
          setText((prev) => prev + delta);
        }
      }
    }

    setLoading(false);
  }

  return { text, loading, run };
}

In production, do not expose a private server-side API key directly to the browser. Prefer a server-side route handler or backend proxy for client applications.


Error handling

Streaming requests can fail in two ways:

  1. The HTTP request fails before streaming begins
  2. The stream starts, then ends early or sends an error payload

Check the initial HTTP status

if (!response.ok) {
  const text = await response.text();
  throw new Error(`Request failed: ${response.status} ${text}`);
}

Common failure scenarios

ProblemWhat it meansWhat to do
401 UnauthorizedMissing or invalid API keyVerify Authorization: Bearer sr_...
402 Payment RequiredNo balance for paid modelTop up credits or use a free model
404 Not FoundInvalid path or unknown modelCheck endpoint and model ID
429 Too Many RequestsRate limit reachedRetry with backoff
Early connection closeUpstream provider interruptionRetry idempotent requests
Malformed chunk parsingBuffer split incorrectlyUse proper SSE parsing logic

Retry strategy

For network failures and 429 / 5xx responses, use exponential backoff.

async function streamWithRetry(makeRequest: () => Promise<Response>, retries = 3) {
  let attempt = 0;

  while (true) {
    try {
      const response = await makeRequest();

      if (response.ok) {
        return response;
      }

      if (response.status < 500 && response.status !== 429) {
        throw new Error(`Non-retryable error: ${response.status}`);
      }

      throw new Error(`Retryable error: ${response.status}`);
    } catch (error) {
      attempt += 1;

      if (attempt > retries) {
        throw error;
      }

      const delayMs = 500 * 2 ** (attempt - 1);
      await new Promise((resolve) => setTimeout(resolve, delayMs));
    }
  }
}

Best practices

1. Always parse SSE incrementally

Do not assume each network read contains a full JSON object. Buffer partial data and split on SSE event boundaries.

2. Store the final assembled output

Even if you render partial text live, keep a reconstructed final string for persistence and analytics.

3. Capture usage from the final chunk

Usage data is typically only available near the end of the stream. Save it separately from the text buffer.

4. Handle tool calls as structured state

Tool arguments may arrive in fragments. Reconstruct them by call index rather than treating them like plain text.

5. Use streaming for UX, not just speed

Streaming often does not reduce total compute time, but it dramatically improves perceived responsiveness.

6. Keep prompts concise

Long prompts increase time-to-first-token and may reduce the UX benefit of streaming.

7. Prefer server-side streaming in production apps

For web apps, proxy streaming through your backend or route handlers when possible. This keeps your private credentials off the client.


Minimal end-to-end example

const response = await fetch("https://api.solrouter.io/ai/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SOLROUTER_API_KEY}`,
    "Content-Type": "application/json",
    "Accept": "text/event-stream",
  },
  body: JSON.stringify({
    model: "anthropic/claude-sonnet-4",
    stream: true,
    messages: [
      { role: "user", content: "Give me three short tips for building a CLI." }
    ],
  }),
});

if (!response.ok || !response.body) {
  throw new Error(`Request failed: ${response.status}`);
}

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let result = "";

while (true) {
  const { value, done } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });

  const events = buffer.split("\n\n");
  buffer = events.pop() ?? "";

  for (const event of events) {
    const line = event.trim();
    if (!line.startsWith("data: ")) continue;

    const data = line.slice(6);
    if (data === "[DONE]") continue;

    const chunk = JSON.parse(data);
    const delta = chunk.choices?.[0]?.delta?.content ?? "";

    if (delta) {
      result += delta;
      process.stdout.write(delta);
    }
  }
}

console.log("\n\nFinal result:", result);

Next steps