Stream responses

Stream tokens as they are generated using server-sent events with stream: true on the OpenAI chat shape.

Streaming sends the model's output token by token as it is generated, instead of waiting for the whole response. Through FluxRouter you stream the same way you would against OpenAI: set "stream": true on a standard chat completions request to https://api.fluxrouter.ai/v1/chat/completions. The only Flux-specific part is the base URL and the flux-auto model.

Stream a chat completion

Add "stream": true to the request body. The server responds with a text/event-stream of data: lines, each carrying a chunk in OpenAI's chat.completion.chunk shape, and finishes with a data: [DONE] line.

curl

bash
curl https://api.fluxrouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLUX_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "flux-auto",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a haiku about routing." }
    ]
  }'

The -N flag disables curl's output buffering so you see chunks as they arrive. Each event looks like:

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Routes"},"index":0}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":" flow"},"index":0}]}
data: [DONE]

Python (OpenAI SDK)

The OpenAI SDK turns the event stream into an iterator. Point it at the Flux base URL and pass stream=True.

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-...",                          # your Flux key
    base_url="https://api.fluxrouter.ai/v1",   # the one line you change
)

stream = client.chat.completions.create(
    model="flux-auto",
    stream=True,
    messages=[{"role": "user", "content": "Write a haiku about routing."}],
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
print()

Node (OpenAI SDK)

ts
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FLUX_API_KEY,            // your Flux key
  baseURL: "https://api.fluxrouter.ai/v1",     // the one line you change
});

const stream = await client.chat.completions.create({
  model: "flux-auto",
  stream: true,
  messages: [{ role: "user", content: "Write a haiku about routing." }],
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Stream with the Anthropic shape

The Anthropic-compatible base also streams. Set stream=True (or stream: true) against https://api.fluxrouter.ai/anthropic and read Anthropic's event types (content_block_delta and friends) exactly as you would calling Anthropic directly.

python
from anthropic import Anthropic

client = Anthropic(
    api_key="sk-...",                                # your Flux key
    base_url="https://api.fluxrouter.ai/anthropic",  # the one line you change
)

with client.messages.stream(
    model="flux-auto",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about routing."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
print()

Cost on streaming responses

On non-streaming responses FluxRouter returns an X-Flux-Cost-Usd header with the request cost. On streaming responses that header is absent: the final cost is only known after the response headers have already been flushed to your client, so it cannot be included. The cost still lands on your bill as usual.

If you need the per-request cost programmatically, send the request without streaming and read X-Flux-Cost-Usd from the response headers. Either way, every request is metered and billed the same. See Routing and pricing for the full list of transparency headers and how billing works.

Notes

  • Streaming and the model that serves the request are independent: flux-auto still routes the request, and the other X-Flux-* headers (model, request id) are present on the streamed response.
  • Token-level streaming behavior depends on the model you route to, but the wire format is the standard OpenAI (or Anthropic) event stream in every case.