Python Guide
This guide shows how to use SolRouter from Python for chat, streaming, structured output, tool calling, and multimodal workflows.
SolRouter exposes an OpenAI-compatible API, which means you can use either:
- the official
openaiPython SDK - plain
httpxorrequests - validation libraries like Pydantic
- async frameworks such as FastAPI
Base URL
https://api.solrouter.io/ai
Installation and environment setup
Install the OpenAI Python SDK:
pip install openai
Set your API key in an environment variable:
export SOLROUTER_API_KEY=sr_your_api_key
Or load it from a .env file with python-dotenv:
pip install python-dotenv
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.environ["SOLROUTER_API_KEY"]
Basic chat completion
The simplest way to call SolRouter from Python is through the OpenAI SDK.
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=os.environ["SOLROUTER_API_KEY"],
)
completion = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain what an API gateway does."},
],
)
print(completion.choices[0].message.content)
print(completion.usage)
What changed from a standard OpenAI setup
Only two things:
base_url="https://api.solrouter.io/ai"- your SolRouter key with the
sr_prefix
Everything else stays effectively the same.
Using httpx directly
If you prefer not to use the SDK, you can call the API with httpx.
Install it:
pip install httpx
Then send a request:
import httpx
import os
url = "https://api.solrouter.io/ai/chat/completions"
headers = {
"Authorization": f"Bearer {os.environ['SOLROUTER_API_KEY']}",
"Content-Type": "application/json",
}
payload = {
"model": "anthropic/claude-sonnet-4",
"messages": [
{"role": "user", "content": "Write a one-sentence summary of structured output."}
],
}
response = httpx.post(url, headers=headers, json=payload, timeout=60.0)
response.raise_for_status()
data = response.json()
print(data["choices"][0]["message"]["content"])
print(data["usage"])
This is useful when you want full control over HTTP behavior, timeouts, retries, or custom middleware.
Inspecting the response
A typical successful response includes:
choicesmessage.contentfinish_reasonusage
Example:
completion = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "user", "content": "Say hello in one sentence."}
],
)
message = completion.choices[0].message.content
finish_reason = completion.choices[0].finish_reason
usage = completion.usage
print("Message:", message)
print("Finish reason:", finish_reason)
print("Prompt tokens:", usage.prompt_tokens)
print("Completion tokens:", usage.completion_tokens)
print("Total tokens:", usage.total_tokens)
If supported by the selected model and route, usage may also include a cost field in the raw response payload.
Streaming responses
Streaming is ideal for chat interfaces, long generations, and responsive UIs.
Streaming with the OpenAI SDK
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=os.environ["SOLROUTER_API_KEY"],
)
stream = client.chat.completions.create(
model="openai/gpt-4o-mini",
stream=True,
messages=[
{"role": "user", "content": "Write a short paragraph about low-latency APIs."}
],
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
Streaming with httpx
import json
import os
import httpx
url = "https://api.solrouter.io/ai/chat/completions"
headers = {
"Authorization": f"Bearer {os.environ['SOLROUTER_API_KEY']}",
"Content-Type": "application/json",
"Accept": "text/event-stream",
}
payload = {
"model": "openai/gpt-4o-mini",
"stream": True,
"messages": [
{"role": "user", "content": "Explain streaming in two sentences."}
],
}
full_text = ""
with httpx.stream("POST", url, headers=headers, json=payload, timeout=60.0) as response:
response.raise_for_status()
for line in response.iter_lines():
if not line or not line.startswith("data: "):
continue
data = line[6:]
if data == "[DONE]":
break
chunk = json.loads(data)
delta = chunk.get("choices", [{}])[0].get("delta", {}).get("content", "")
if delta:
full_text += delta
print(delta, end="", flush=True)
print("\n")
print("Final text:", full_text)
Practical streaming tips
- always check the initial HTTP status before reading the stream
- handle partial output gracefully
- do not assume every network chunk contains a complete JSON object
- expect the final usage data near the end of the stream, not the beginning
For deeper streaming behavior, see Streaming.
Structured output with Pydantic
Structured output is one of the strongest Python integration patterns because it combines model generation with runtime validation.
Install Pydantic if you do not already have it:
pip install pydantic
Example: extracting a typed contact record
from openai import OpenAI
from pydantic import BaseModel, EmailStr
import json
import os
class ContactRecord(BaseModel):
name: str
email: EmailStr
company: str
client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=os.environ["SOLROUTER_API_KEY"],
)
completion = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{
"role": "user",
"content": "Extract name, email, and company from: Sarah Chen, sarah@acme.io, Acme Labs"
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "contact_record",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"company": {"type": "string"}
},
"required": ["name", "email", "company"],
"additionalProperties": False
}
}
}
)
raw = completion.choices[0].message.content or "{}"
parsed = ContactRecord.model_validate(json.loads(raw))
print(parsed)
Why this pattern is strong
- the model is guided into a predictable shape
- your application validates the result before use
- malformed output fails early and safely
- the validated object can go straight into business logic
For more detail, see Structured Output.
Tool calling in Python
Tool calling lets the model request a function that your Python application executes.
Example workflow
from openai import OpenAI
import json
import os
client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=os.environ["SOLROUTER_API_KEY"],
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Returns the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"],
"additionalProperties": False
}
}
}
]
def get_weather(city: str) -> dict:
return {
"city": city,
"temperature_c": 18,
"condition": "Cloudy",
"wind_kph": 12,
}
first = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "user", "content": "What's the weather in Berlin?"}
],
tools=tools,
)
assistant_message = first.choices[0].message
tool_call = assistant_message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
tool_result = get_weather(args["city"])
second = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "user", "content": "What's the weather in Berlin?"},
assistant_message,
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(tool_result),
},
],
tools=tools,
)
print(second.choices[0].message.content)
Best practices for Python tool calling
- validate arguments before executing a function
- never dynamically dispatch arbitrary tool names without checks
- return structured JSON for tool results
- keep tools focused and narrow
- run tools server-side, not in public client code
For the full workflow, see Tool Calling.
Vision and multimodal requests
Python is a great fit for extraction workflows involving images, documents, and other media.
Example with a remote image
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=os.environ["SOLROUTER_API_KEY"],
)
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Extract the invoice number and total from this image."
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/invoice.jpg",
"detail": "high"
}
}
]
}
]
)
print(completion.choices[0].message.content)
print(completion.usage)
Example with a local image file
import base64
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=os.environ["SOLROUTER_API_KEY"],
)
with open("invoice.jpg", "rb") as f:
encoded = base64.b64encode(f.read()).decode("utf-8")
data_url = f"data:image/jpeg;base64,{encoded}"
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Extract the invoice number and total from this image."
},
{
"type": "image_url",
"image_url": {
"url": data_url,
"detail": "high"
}
}
]
}
]
)
print(completion.choices[0].message.content)
When to use multimodal from Python
Python is especially strong for:
- OCR pipelines
- document extraction
- batch processing
- data enrichment
- backend automation jobs
- ETL-style AI workflows
For modality-specific details, see Vision & Multimodal.
Building a reusable Python client wrapper
In a real application, it helps to centralize client setup and request defaults.
from openai import OpenAI
import os
class SolRouterClient:
def __init__(self):
api_key = os.environ.get("SOLROUTER_API_KEY")
if not api_key:
raise RuntimeError("Missing SOLROUTER_API_KEY")
self.client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=api_key,
)
def chat(self, prompt: str, model: str = "openai/gpt-4o-mini") -> str:
completion = self.client.chat.completions.create(
model=model,
messages=[
{"role": "user", "content": prompt}
],
)
return completion.choices[0].message.content or ""
solrouter = SolRouterClient()
print(solrouter.chat("Explain what a context window is."))
You can extend this wrapper with:
- retries
- logging
- tracing
- default system prompts
- schema validation helpers
- multimodal utilities
- rate limiting
Retry strategy with httpx
Transient failures like 429, 500, 502, and 503 should usually be retried with exponential backoff.
import time
import httpx
import os
def request_with_retry(payload: dict, retries: int = 3):
url = "https://api.solrouter.io/ai/chat/completions"
headers = {
"Authorization": f"Bearer {os.environ['SOLROUTER_API_KEY']}",
"Content-Type": "application/json",
}
attempt = 0
while True:
response = httpx.post(url, headers=headers, json=payload, timeout=60.0)
if response.status_code < 400:
return response
retryable = response.status_code in {408, 429, 500, 502, 503}
if not retryable:
return response
attempt += 1
if attempt > retries:
return response
delay = 0.5 * (2 ** (attempt - 1))
time.sleep(delay)
payload = {
"model": "openai/gpt-4o-mini",
"messages": [
{"role": "user", "content": "Give me three short tips for writing CLI tools."}
],
}
response = request_with_retry(payload)
print(response.status_code)
print(response.json())
When not to retry
Do not automatically retry:
- malformed requests
- authentication failures
- insufficient balance errors
- invalid schemas
- model-not-found errors
For more detail, see Errors.
FastAPI integration pattern
Python teams often use SolRouter inside FastAPI services.
Install FastAPI and Uvicorn:
pip install fastapi uvicorn openai
Example FastAPI endpoint
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI
import os
app = FastAPI()
client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=os.environ["SOLROUTER_API_KEY"],
)
class ChatRequest(BaseModel):
prompt: str
@app.post("/chat")
def chat(req: ChatRequest):
try:
completion = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "user", "content": req.prompt}
],
)
return {
"content": completion.choices[0].message.content,
"usage": completion.usage,
}
except Exception as exc:
raise HTTPException(status_code=500, detail=str(exc))
Run it with:
uvicorn main:app --reload
This pattern is useful when you want:
- your own backend auth
- server-side API key isolation
- request logging
- internal rate limiting
- usage metering
- workflow orchestration
Common mistakes
1. Forgetting base_url
If you omit the SolRouter base URL, the SDK will target the default provider instead of SolRouter.
Wrong:
client = OpenAI(api_key=os.environ["SOLROUTER_API_KEY"])
Correct:
client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=os.environ["SOLROUTER_API_KEY"],
)
2. Not validating structured output
Always validate JSON output before using it in your business logic.
3. Treating tool arguments as trusted
Tool call arguments are model-generated and must be validated.
4. Sending unsupported modalities to text-only models
Always verify that the selected model supports image, file, audio, or video input.
5. Logging secrets accidentally
Do not log:
- raw API keys
- bearer headers
- private documents
- user-uploaded sensitive media
- full sensitive prompts unless strictly necessary
6. Retrying non-retryable failures
Do not keep retrying malformed payloads or insufficient-balance errors.
7. Ignoring token usage
Python data pipelines can quietly become expensive if you do not track usage and cost.
Recommended production pattern
A strong Python production stack often looks like this:
- OpenAI SDK for API compatibility and convenience
- Pydantic for validation
- httpx where lower-level control is needed
- FastAPI for service integration
- environment variables for secrets
- server-side execution only for private workflows
- structured logging for error diagnostics
- retry with backoff for transient failures
A practical architecture
FastAPI / worker / job runner
↓
validated request payload
↓
SolRouter client wrapper
↓
https://api.solrouter.io/ai
↓
parsed + validated response
↓
database / API / UI
Minimal robust helper
from openai import OpenAI
import os
import json
class ChatFailure(Exception):
pass
class SolRouter:
def __init__(self):
api_key = os.environ.get("SOLROUTER_API_KEY")
if not api_key:
raise RuntimeError("Missing SOLROUTER_API_KEY")
self.client = OpenAI(
base_url="https://api.solrouter.io/ai",
api_key=api_key,
)
def chat(self, messages, model="openai/gpt-4o-mini"):
try:
completion = self.client.chat.completions.create(
model=model,
messages=messages,
)
return {
"content": completion.choices[0].message.content,
"usage": completion.usage,
}
except Exception as exc:
raise ChatFailure(str(exc)) from exc
solrouter = SolRouter()
result = solrouter.chat([
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain what retry with backoff means."},
])
print(json.dumps(result, default=str, indent=2))
This gives you one clean place to:
- set defaults
- add metrics
- normalize exceptions
- inject tracing
- attach retry logic
- standardize logging
Next steps
- API Reference — complete request and response schema
- Streaming — SSE handling and partial-output patterns
- Tool Calling — full function execution flow
- Structured Output — JSON mode and schema-constrained parsing
- Vision & Multimodal — images, files, audio, and video
- Errors — retries, rate limits, and debugging