Stream responses
Stream tokens as they are generated using server-sent events with stream: true on the OpenAI chat shape.
Streaming sends the model's output token by token as it is generated, instead of waiting for the whole response. Through FluxRouter you stream the same way you would against OpenAI: set "stream": true on a standard chat completions request to https://api.fluxrouter.ai/v1/chat/completions. The only Flux-specific part is the base URL and the flux-auto model.
Stream a chat completion
Add "stream": true to the request body. The server responds with a text/event-stream of data: lines, each carrying a chunk in OpenAI's chat.completion.chunk shape, and finishes with a data: [DONE] line.
curl
curl https://api.fluxrouter.ai/v1/chat/completions \
-H "Authorization: Bearer $FLUX_API_KEY" \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "flux-auto",
"stream": true,
"messages": [
{ "role": "user", "content": "Write a haiku about routing." }
]
}'
The -N flag disables curl's output buffering so you see chunks as they arrive. Each event looks like:
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Routes"},"index":0}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":" flow"},"index":0}]}
data: [DONE]
Python (OpenAI SDK)
The OpenAI SDK turns the event stream into an iterator. Point it at the Flux base URL and pass stream=True.
from openai import OpenAI
client = OpenAI(
api_key="sk-...", # your Flux key
base_url="https://api.fluxrouter.ai/v1", # the one line you change
)
stream = client.chat.completions.create(
model="flux-auto",
stream=True,
messages=[{"role": "user", "content": "Write a haiku about routing."}],
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print()
Node (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FLUX_API_KEY, // your Flux key
baseURL: "https://api.fluxrouter.ai/v1", // the one line you change
});
const stream = await client.chat.completions.create({
model: "flux-auto",
stream: true,
messages: [{ role: "user", content: "Write a haiku about routing." }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
Stream with the Anthropic shape
The Anthropic-compatible base also streams. Set stream=True (or stream: true) against https://api.fluxrouter.ai/anthropic and read Anthropic's event types (content_block_delta and friends) exactly as you would calling Anthropic directly.
from anthropic import Anthropic
client = Anthropic(
api_key="sk-...", # your Flux key
base_url="https://api.fluxrouter.ai/anthropic", # the one line you change
)
with client.messages.stream(
model="flux-auto",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a haiku about routing."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print()
Cost on streaming responses
On non-streaming responses FluxRouter returns an X-Flux-Cost-Usd header with the request cost. On streaming responses that header is absent: the final cost is only known after the response headers have already been flushed to your client, so it cannot be included. The cost still lands on your bill as usual.
If you need the per-request cost programmatically, send the request without streaming and read X-Flux-Cost-Usd from the response headers. Either way, every request is metered and billed the same. See Routing and pricing for the full list of transparency headers and how billing works.
Notes
- Streaming and the model that serves the request are independent:
flux-autostill routes the request, and the otherX-Flux-*headers (model, request id) are present on the streamed response. - Token-level streaming behavior depends on the model you route to, but the wire format is the standard OpenAI (or Anthropic) event stream in every case.