Python Guide


This guide shows how to use SolRouter from Python for chat, streaming, structured output, tool calling, and multimodal workflows.

SolRouter exposes an OpenAI-compatible API, which means you can use either:

  • the official openai Python SDK
  • plain httpx or requests
  • validation libraries like Pydantic
  • async frameworks such as FastAPI

Base URL

https://api.solrouter.io/ai

Installation and environment setup

Install the OpenAI Python SDK:

pip install openai

Set your API key in an environment variable:

export SOLROUTER_API_KEY=sr_your_api_key

Or load it from a .env file with python-dotenv:

pip install python-dotenv
from dotenv import load_dotenv
import os

load_dotenv()

api_key = os.environ["SOLROUTER_API_KEY"]

Basic chat completion

The simplest way to call SolRouter from Python is through the OpenAI SDK.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.solrouter.io/ai",
    api_key=os.environ["SOLROUTER_API_KEY"],
)

completion = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Explain what an API gateway does."},
    ],
)

print(completion.choices[0].message.content)
print(completion.usage)

What changed from a standard OpenAI setup

Only two things:

  • base_url="https://api.solrouter.io/ai"
  • your SolRouter key with the sr_ prefix

Everything else stays effectively the same.


Using httpx directly

If you prefer not to use the SDK, you can call the API with httpx.

Install it:

pip install httpx

Then send a request:

import httpx
import os

url = "https://api.solrouter.io/ai/chat/completions"

headers = {
    "Authorization": f"Bearer {os.environ['SOLROUTER_API_KEY']}",
    "Content-Type": "application/json",
}

payload = {
    "model": "anthropic/claude-sonnet-4",
    "messages": [
        {"role": "user", "content": "Write a one-sentence summary of structured output."}
    ],
}

response = httpx.post(url, headers=headers, json=payload, timeout=60.0)
response.raise_for_status()

data = response.json()

print(data["choices"][0]["message"]["content"])
print(data["usage"])

This is useful when you want full control over HTTP behavior, timeouts, retries, or custom middleware.


Inspecting the response

A typical successful response includes:

  • choices
  • message.content
  • finish_reason
  • usage

Example:

completion = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Say hello in one sentence."}
    ],
)

message = completion.choices[0].message.content
finish_reason = completion.choices[0].finish_reason
usage = completion.usage

print("Message:", message)
print("Finish reason:", finish_reason)
print("Prompt tokens:", usage.prompt_tokens)
print("Completion tokens:", usage.completion_tokens)
print("Total tokens:", usage.total_tokens)

If supported by the selected model and route, usage may also include a cost field in the raw response payload.


Streaming responses

Streaming is ideal for chat interfaces, long generations, and responsive UIs.

Streaming with the OpenAI SDK

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.solrouter.io/ai",
    api_key=os.environ["SOLROUTER_API_KEY"],
)

stream = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    stream=True,
    messages=[
        {"role": "user", "content": "Write a short paragraph about low-latency APIs."}
    ],
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Streaming with httpx

import json
import os
import httpx

url = "https://api.solrouter.io/ai/chat/completions"

headers = {
    "Authorization": f"Bearer {os.environ['SOLROUTER_API_KEY']}",
    "Content-Type": "application/json",
    "Accept": "text/event-stream",
}

payload = {
    "model": "openai/gpt-4o-mini",
    "stream": True,
    "messages": [
        {"role": "user", "content": "Explain streaming in two sentences."}
    ],
}

full_text = ""

with httpx.stream("POST", url, headers=headers, json=payload, timeout=60.0) as response:
    response.raise_for_status()

    for line in response.iter_lines():
        if not line or not line.startswith("data: "):
            continue

        data = line[6:]

        if data == "[DONE]":
            break

        chunk = json.loads(data)
        delta = chunk.get("choices", [{}])[0].get("delta", {}).get("content", "")

        if delta:
            full_text += delta
            print(delta, end="", flush=True)

print("\n")
print("Final text:", full_text)

Practical streaming tips

  • always check the initial HTTP status before reading the stream
  • handle partial output gracefully
  • do not assume every network chunk contains a complete JSON object
  • expect the final usage data near the end of the stream, not the beginning

For deeper streaming behavior, see Streaming.


Structured output with Pydantic

Structured output is one of the strongest Python integration patterns because it combines model generation with runtime validation.

Install Pydantic if you do not already have it:

pip install pydantic

Example: extracting a typed contact record

from openai import OpenAI
from pydantic import BaseModel, EmailStr
import json
import os

class ContactRecord(BaseModel):
    name: str
    email: EmailStr
    company: str

client = OpenAI(
    base_url="https://api.solrouter.io/ai",
    api_key=os.environ["SOLROUTER_API_KEY"],
)

completion = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": "Extract name, email, and company from: Sarah Chen, sarah@acme.io, Acme Labs"
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "contact_record",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "email": {"type": "string"},
                    "company": {"type": "string"}
                },
                "required": ["name", "email", "company"],
                "additionalProperties": False
            }
        }
    }
)

raw = completion.choices[0].message.content or "{}"
parsed = ContactRecord.model_validate(json.loads(raw))

print(parsed)

Why this pattern is strong

  • the model is guided into a predictable shape
  • your application validates the result before use
  • malformed output fails early and safely
  • the validated object can go straight into business logic

For more detail, see Structured Output.


Tool calling in Python

Tool calling lets the model request a function that your Python application executes.

Example workflow

from openai import OpenAI
import json
import os

client = OpenAI(
    base_url="https://api.solrouter.io/ai",
    api_key=os.environ["SOLROUTER_API_KEY"],
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Returns the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"],
                "additionalProperties": False
            }
        }
    }
]

def get_weather(city: str) -> dict:
    return {
        "city": city,
        "temperature_c": 18,
        "condition": "Cloudy",
        "wind_kph": 12,
    }

first = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "user", "content": "What's the weather in Berlin?"}
    ],
    tools=tools,
)

assistant_message = first.choices[0].message
tool_call = assistant_message.tool_calls[0]

args = json.loads(tool_call.function.arguments)
tool_result = get_weather(args["city"])

second = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "user", "content": "What's the weather in Berlin?"},
        assistant_message,
        {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(tool_result),
        },
    ],
    tools=tools,
)

print(second.choices[0].message.content)

Best practices for Python tool calling

  • validate arguments before executing a function
  • never dynamically dispatch arbitrary tool names without checks
  • return structured JSON for tool results
  • keep tools focused and narrow
  • run tools server-side, not in public client code

For the full workflow, see Tool Calling.


Vision and multimodal requests

Python is a great fit for extraction workflows involving images, documents, and other media.

Example with a remote image

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.solrouter.io/ai",
    api_key=os.environ["SOLROUTER_API_KEY"],
)

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Extract the invoice number and total from this image."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/invoice.jpg",
                        "detail": "high"
                    }
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)
print(completion.usage)

Example with a local image file

import base64
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.solrouter.io/ai",
    api_key=os.environ["SOLROUTER_API_KEY"],
)

with open("invoice.jpg", "rb") as f:
    encoded = base64.b64encode(f.read()).decode("utf-8")

data_url = f"data:image/jpeg;base64,{encoded}"

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Extract the invoice number and total from this image."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": data_url,
                        "detail": "high"
                    }
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)

When to use multimodal from Python

Python is especially strong for:

  • OCR pipelines
  • document extraction
  • batch processing
  • data enrichment
  • backend automation jobs
  • ETL-style AI workflows

For modality-specific details, see Vision & Multimodal.


Building a reusable Python client wrapper

In a real application, it helps to centralize client setup and request defaults.

from openai import OpenAI
import os

class SolRouterClient:
    def __init__(self):
        api_key = os.environ.get("SOLROUTER_API_KEY")
        if not api_key:
            raise RuntimeError("Missing SOLROUTER_API_KEY")

        self.client = OpenAI(
            base_url="https://api.solrouter.io/ai",
            api_key=api_key,
        )

    def chat(self, prompt: str, model: str = "openai/gpt-4o-mini") -> str:
        completion = self.client.chat.completions.create(
            model=model,
            messages=[
                {"role": "user", "content": prompt}
            ],
        )
        return completion.choices[0].message.content or ""

solrouter = SolRouterClient()
print(solrouter.chat("Explain what a context window is."))

You can extend this wrapper with:

  • retries
  • logging
  • tracing
  • default system prompts
  • schema validation helpers
  • multimodal utilities
  • rate limiting

Retry strategy with httpx

Transient failures like 429, 500, 502, and 503 should usually be retried with exponential backoff.

import time
import httpx
import os

def request_with_retry(payload: dict, retries: int = 3):
    url = "https://api.solrouter.io/ai/chat/completions"
    headers = {
        "Authorization": f"Bearer {os.environ['SOLROUTER_API_KEY']}",
        "Content-Type": "application/json",
    }

    attempt = 0

    while True:
        response = httpx.post(url, headers=headers, json=payload, timeout=60.0)

        if response.status_code < 400:
            return response

        retryable = response.status_code in {408, 429, 500, 502, 503}

        if not retryable:
            return response

        attempt += 1
        if attempt > retries:
            return response

        delay = 0.5 * (2 ** (attempt - 1))
        time.sleep(delay)

payload = {
    "model": "openai/gpt-4o-mini",
    "messages": [
        {"role": "user", "content": "Give me three short tips for writing CLI tools."}
    ],
}

response = request_with_retry(payload)
print(response.status_code)
print(response.json())

When not to retry

Do not automatically retry:

  • malformed requests
  • authentication failures
  • insufficient balance errors
  • invalid schemas
  • model-not-found errors

For more detail, see Errors.


FastAPI integration pattern

Python teams often use SolRouter inside FastAPI services.

Install FastAPI and Uvicorn:

pip install fastapi uvicorn openai

Example FastAPI endpoint

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI
import os

app = FastAPI()

client = OpenAI(
    base_url="https://api.solrouter.io/ai",
    api_key=os.environ["SOLROUTER_API_KEY"],
)

class ChatRequest(BaseModel):
    prompt: str

@app.post("/chat")
def chat(req: ChatRequest):
    try:
        completion = client.chat.completions.create(
            model="openai/gpt-4o-mini",
            messages=[
                {"role": "user", "content": req.prompt}
            ],
        )
        return {
            "content": completion.choices[0].message.content,
            "usage": completion.usage,
        }
    except Exception as exc:
        raise HTTPException(status_code=500, detail=str(exc))

Run it with:

uvicorn main:app --reload

This pattern is useful when you want:

  • your own backend auth
  • server-side API key isolation
  • request logging
  • internal rate limiting
  • usage metering
  • workflow orchestration

Common mistakes

1. Forgetting base_url

If you omit the SolRouter base URL, the SDK will target the default provider instead of SolRouter.

Wrong:

client = OpenAI(api_key=os.environ["SOLROUTER_API_KEY"])

Correct:

client = OpenAI(
    base_url="https://api.solrouter.io/ai",
    api_key=os.environ["SOLROUTER_API_KEY"],
)

2. Not validating structured output

Always validate JSON output before using it in your business logic.

3. Treating tool arguments as trusted

Tool call arguments are model-generated and must be validated.

4. Sending unsupported modalities to text-only models

Always verify that the selected model supports image, file, audio, or video input.

5. Logging secrets accidentally

Do not log:

  • raw API keys
  • bearer headers
  • private documents
  • user-uploaded sensitive media
  • full sensitive prompts unless strictly necessary

6. Retrying non-retryable failures

Do not keep retrying malformed payloads or insufficient-balance errors.

7. Ignoring token usage

Python data pipelines can quietly become expensive if you do not track usage and cost.


Recommended production pattern

A strong Python production stack often looks like this:

  • OpenAI SDK for API compatibility and convenience
  • Pydantic for validation
  • httpx where lower-level control is needed
  • FastAPI for service integration
  • environment variables for secrets
  • server-side execution only for private workflows
  • structured logging for error diagnostics
  • retry with backoff for transient failures

A practical architecture

FastAPI / worker / job runner
        ↓
  validated request payload
        ↓
  SolRouter client wrapper
        ↓
 https://api.solrouter.io/ai
        ↓
 parsed + validated response
        ↓
 database / API / UI

Minimal robust helper

from openai import OpenAI
import os
import json

class ChatFailure(Exception):
    pass

class SolRouter:
    def __init__(self):
        api_key = os.environ.get("SOLROUTER_API_KEY")
        if not api_key:
            raise RuntimeError("Missing SOLROUTER_API_KEY")

        self.client = OpenAI(
            base_url="https://api.solrouter.io/ai",
            api_key=api_key,
        )

    def chat(self, messages, model="openai/gpt-4o-mini"):
        try:
            completion = self.client.chat.completions.create(
                model=model,
                messages=messages,
            )
            return {
                "content": completion.choices[0].message.content,
                "usage": completion.usage,
            }
        except Exception as exc:
            raise ChatFailure(str(exc)) from exc

solrouter = SolRouter()

result = solrouter.chat([
    {"role": "system", "content": "You are a concise assistant."},
    {"role": "user", "content": "Explain what retry with backoff means."},
])

print(json.dumps(result, default=str, indent=2))

This gives you one clean place to:

  • set defaults
  • add metrics
  • normalize exceptions
  • inject tracing
  • attach retry logic
  • standardize logging

Next steps