SolRouter

Streaming

Streaming lets your application receive model output incrementally as it is generated instead of waiting for the full response to complete.

With SolRouter, streaming uses standard Server-Sent Events (SSE) over the same OpenAI-compatible /chat/completions endpoint. You enable it by setting:

{
  "stream": true
}

This is ideal for:

Chat interfaces that should feel responsive
Long answers where early tokens matter
Tool-calling workflows that benefit from partial output
Reducing perceived latency in production UIs

Base URL

https://api.solrouter.io/ai

How streaming works

When stream: true is included in the request body, SolRouter returns a response with content type similar to:

text/event-stream

Instead of a single JSON payload, the server sends a sequence of data: events.

Typical flow:

Your client sends POST /chat/completions
The model starts generating output
SolRouter forwards chunks as SSE events
Your client appends text as chunks arrive
A final [DONE] event signals completion

A typical stream looks like this:

data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":"Low"},"index":0}]}

data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"content":" latency"},"index":0}]}

data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"content":" feels like"},"index":0}]}

data: {"id":"chatcmpl_1","object":"chat.completion.chunk","choices":[{"delta":{"content":" magic."},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":6,"total_tokens":20,"cost":0.0000027}}

data: [DONE]

Basic streaming request

`curl`

curl https://api.solrouter.io/ai/chat/completions \
  -H "Authorization: Bearer $SOLROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a short haiku about speed." }
    ]
  }'

Request body

{
  "model": "openai/gpt-4o-mini",
  "stream": true,
  "messages": [
    {
      "role": "user",
      "content": "Write a short haiku about speed."
    }
  ]
}

Chunk format

Each SSE data: line contains a JSON object representing a partial completion chunk.

Common chunk fields

Field	Type	Description
`id`	`string`	Completion identifier
`object`	`string`	Usually `chat.completion.chunk`
`created`	`number`	Unix timestamp
`model`	`string`	Model used for generation
`choices`	`array`	Partial output choices
`usage`	`object`	Token usage and cost, typically included near the end of the stream

`choices[0].delta`

The delta object contains incremental output.

Field	Type	Description
`role`	`string`	Usually appears in the first chunk
`content`	`string`	Partial text fragment
`tool_calls`	`array`	Partial tool call fragments when tool calling is used

Example first chunk

{
  "id": "chatcmpl_1",
  "object": "chat.completion.chunk",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "Low"
      }
    }
  ]
}

Example middle chunk

{
  "id": "chatcmpl_1",
  "object": "chat.completion.chunk",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": " latency"
      }
    }
  ]
}

Example final chunk

{
  "id": "chatcmpl_1",
  "object": "chat.completion.chunk",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 6,
    "total_tokens": 20,
    "cost": 0.0000027
  }
}

Stream terminator

data: [DONE]

Streaming in TypeScript with `fetch`

The browser fetch API can read the response body as a stream.

const response = await fetch("https://api.solrouter.io/ai/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SOLROUTER_API_KEY}`,
    "Content-Type": "application/json",
    "Accept": "text/event-stream",
  },
  body: JSON.stringify({
    model: "openai/gpt-4o-mini",
    stream: true,
    messages: [
      { role: "user", content: "Explain streaming in one paragraph." },
    ],
  }),
});

if (!response.ok || !response.body) {
  throw new Error(`Request failed: ${response.status}`);
}

const reader = response.body.getReader();
const decoder = new TextDecoder();

let buffer = "";
let fullText = "";
let usage: unknown = null;

while (true) {
  const { value, done } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });

  const parts = buffer.split("\n\n");
  buffer = parts.pop() ?? "";

  for (const part of parts) {
    const line = part.trim();
    if (!line.startsWith("data: ")) continue;

    const payload = line.slice(6);

    if (payload === "[DONE]") {
      console.log("Stream complete");
      continue;
    }

    const chunk = JSON.parse(payload);
    const delta = chunk.choices?.[0]?.delta?.content ?? "";

    if (delta) {
      fullText += delta;
      process.stdout.write(delta);
    }

    if (chunk.usage) {
      usage = chunk.usage;
    }
  }
}

console.log("\nFinal text:", fullText);
console.log("Usage:", usage);

Streaming in Node.js with the OpenAI SDK

Many OpenAI-compatible SDKs support streaming directly when pointed at SolRouter.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.solrouter.io/ai",
  apiKey: process.env.SOLROUTER_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  stream: true,
  messages: [
    { role: "user", content: "Write a short poem about APIs." },
  ],
});

for await (const chunk of stream) {
  const text = chunk.choices?.[0]?.delta?.content ?? "";
  process.stdout.write(text);
}

If your SDK exposes usage in the final chunk, you can inspect it there.

Streaming in Python

Using `httpx`

import json
import os
import httpx

url = "https://api.solrouter.io/ai/chat/completions"

headers = {
    "Authorization": f"Bearer {os.environ['SOLROUTER_API_KEY']}",
    "Content-Type": "application/json",
    "Accept": "text/event-stream",
}

payload = {
    "model": "openai/gpt-4o-mini",
    "stream": True,
    "messages": [
        {"role": "user", "content": "Explain what SSE streaming is."}
    ],
}

full_text = ""
usage = None

with httpx.stream("POST", url, headers=headers, json=payload, timeout=60.0) as response:
    response.raise_for_status()

    for line in response.iter_lines():
        if not line or not line.startswith("data: "):
            continue

        data = line[6:]

        if data == "[DONE]":
            break

        chunk = json.loads(data)
        delta = chunk.get("choices", [{}])[0].get("delta", {}).get("content", "")

        if delta:
            full_text += delta
            print(delta, end="", flush=True)

        if "usage" in chunk:
            usage = chunk["usage"]

print("\n")
print("Final text:", full_text)
print("Usage:", usage)

Using the OpenAI Python SDK

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.solrouter.io/ai",
    api_key=os.environ["SOLROUTER_API_KEY"],
)

stream = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    stream=True,
    messages=[
        {"role": "user", "content": "Write a slogan for a fast AI API."}
    ],
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Reconstructing the final message

Streaming sends fragments, so your client usually needs to reconstruct the full output.

Simple concatenation

For plain text responses, concatenate every delta.content fragment in order:

let fullText = "";

for await (const chunk of stream) {
  const part = chunk.choices?.[0]?.delta?.content ?? "";
  fullText += part;
}

Why reconstruction matters

You often need the final assembled text for:

Saving chat history
Database storage
Analytics
Post-processing
Markdown rendering
Structured parsing after generation completes

Streaming with tool calling

When tool calling is enabled, streamed chunks may include partial tool_calls data instead of plain text.

Example request

{
  "model": "openai/gpt-4o-mini",
  "stream": true,
  "messages": [
    {
      "role": "user",
      "content": "What's the weather in Tokyo?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Returns weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": { "type": "string" }
          },
          "required": ["city"]
        }
      }
    }
  ]
}

Example streamed tool chunks

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_1","type":"function","function":{"name":"get_weather","arguments":""}}]}}]}

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"city\":\"To"}}]}}]}

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"kyo\"}"}}]}}]}

data: {"choices":[{"delta":{},"finish_reason":"tool_calls"}]}

Reconstructing tool arguments

Tool arguments may arrive in pieces and need to be concatenated by tool_calls[index].

const toolCalls: Record<number, { id?: string; name?: string; arguments: string }> = {};

for await (const chunk of stream) {
  const calls = chunk.choices?.[0]?.delta?.tool_calls ?? [];

  for (const call of calls) {
    const index = call.index ?? 0;

    if (!toolCalls[index]) {
      toolCalls[index] = { arguments: "" };
    }

    if (call.id) {
      toolCalls[index].id = call.id;
    }

    if (call.function?.name) {
      toolCalls[index].name = call.function.name;
    }

    if (call.function?.arguments) {
      toolCalls[index].arguments += call.function.arguments;
    }
  }
}

After the stream ends, parse the reconstructed argument string as JSON.

Streaming in React

A typical React pattern is to append incoming chunks into component state.

import { useState } from "react";

export function useStreamingChat() {
  const [text, setText] = useState("");
  const [loading, setLoading] = useState(false);

  async function run(prompt: string) {
    setLoading(true);
    setText("");

    const response = await fetch("https://api.solrouter.io/ai/chat/completions", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${process.env.NEXT_PUBLIC_SOLROUTER_API_KEY}`,
        "Content-Type": "application/json",
        "Accept": "text/event-stream",
      },
      body: JSON.stringify({
        model: "openai/gpt-4o-mini",
        stream: true,
        messages: [{ role: "user", content: prompt }],
      }),
    });

    if (!response.body) {
      setLoading(false);
      throw new Error("Missing response body");
    }

    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let buffer = "";

    while (true) {
      const { value, done } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const events = buffer.split("\n\n");
      buffer = events.pop() ?? "";

      for (const event of events) {
        const line = event.trim();
        if (!line.startsWith("data: ")) continue;

        const payload = line.slice(6);
        if (payload === "[DONE]") continue;

        const chunk = JSON.parse(payload);
        const delta = chunk.choices?.[0]?.delta?.content ?? "";

        if (delta) {
          setText((prev) => prev + delta);
        }
      }
    }

    setLoading(false);
  }

  return { text, loading, run };
}

In production, do not expose a private server-side API key directly to the browser. Prefer a server-side route handler or backend proxy for client applications.

Error handling

Streaming requests can fail in two ways:

The HTTP request fails before streaming begins
The stream starts, then ends early or sends an error payload

Check the initial HTTP status

if (!response.ok) {
  const text = await response.text();
  throw new Error(`Request failed: ${response.status} ${text}`);
}

Common failure scenarios

Problem	What it means	What to do
`401 Unauthorized`	Missing or invalid API key	Verify `Authorization: Bearer sr_...`
`402 Payment Required`	No balance for paid model	Top up credits or use a free model
`404 Not Found`	Invalid path or unknown model	Check endpoint and model ID
`429 Too Many Requests`	Rate limit reached	Retry with backoff
Early connection close	Upstream provider interruption	Retry idempotent requests
Malformed chunk parsing	Buffer split incorrectly	Use proper SSE parsing logic

Retry strategy

For network failures and 429 / 5xx responses, use exponential backoff.

async function streamWithRetry(makeRequest: () => Promise<Response>, retries = 3) {
  let attempt = 0;

  while (true) {
    try {
      const response = await makeRequest();

      if (response.ok) {
        return response;
      }

      if (response.status < 500 && response.status !== 429) {
        throw new Error(`Non-retryable error: ${response.status}`);
      }

      throw new Error(`Retryable error: ${response.status}`);
    } catch (error) {
      attempt += 1;

      if (attempt > retries) {
        throw error;
      }

      const delayMs = 500 * 2 ** (attempt - 1);
      await new Promise((resolve) => setTimeout(resolve, delayMs));
    }
  }
}

Best practices

1. Always parse SSE incrementally

Do not assume each network read contains a full JSON object. Buffer partial data and split on SSE event boundaries.

2. Store the final assembled output

Even if you render partial text live, keep a reconstructed final string for persistence and analytics.

3. Capture usage from the final chunk

Usage data is typically only available near the end of the stream. Save it separately from the text buffer.

4. Handle tool calls as structured state

Tool arguments may arrive in fragments. Reconstruct them by call index rather than treating them like plain text.

5. Use streaming for UX, not just speed

Streaming often does not reduce total compute time, but it dramatically improves perceived responsiveness.

6. Keep prompts concise

Long prompts increase time-to-first-token and may reduce the UX benefit of streaming.

7. Prefer server-side streaming in production apps

For web apps, proxy streaming through your backend or route handlers when possible. This keeps your private credentials off the client.

Minimal end-to-end example

const response = await fetch("https://api.solrouter.io/ai/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SOLROUTER_API_KEY}`,
    "Content-Type": "application/json",
    "Accept": "text/event-stream",
  },
  body: JSON.stringify({
    model: "anthropic/claude-sonnet-4",
    stream: true,
    messages: [
      { role: "user", content: "Give me three short tips for building a CLI." }
    ],
  }),
});

if (!response.ok || !response.body) {
  throw new Error(`Request failed: ${response.status}`);
}

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let result = "";

while (true) {
  const { value, done } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });

  const events = buffer.split("\n\n");
  buffer = events.pop() ?? "";

  for (const event of events) {
    const line = event.trim();
    if (!line.startsWith("data: ")) continue;

    const data = line.slice(6);
    if (data === "[DONE]") continue;

    const chunk = JSON.parse(data);
    const delta = chunk.choices?.[0]?.delta?.content ?? "";

    if (delta) {
      result += delta;
      process.stdout.write(delta);
    }
  }
}

console.log("\n\nFinal result:", result);

Next steps

API Reference — request and response schemas across endpoints
Tool Calling — full function execution workflow
Structured Output — JSON mode and schema-constrained responses
Vision & Multimodal — sending images and mixed content blocks
Errors — retry strategy, rate limits, and failure handling

SOLROUTER