Routing & pricing
How flux-auto routes each request, what the three pricing lanes cost, pay-as-you-go billing, and the transparency headers that show which model answered.
FluxRouter's job is to right-size each request: send simple work to a cheap, fast model and hard work to a stronger one, without you having to pick. You set model to flux-auto, FluxRouter chooses the model, and every response tells you what it chose.
How flux-auto routes
When you send a request with model: "flux-auto", FluxRouter inspects the request and picks a sensible model for it. Today this is deterministic tier routing: the request is matched to a tier (fast, standard, or reasoning) and served by that tier's default model. The same kind of request gets the same class of model, so behavior is predictable.
You stay in control:
- Want a specific class without naming a model? Use a tier alias:
flux-fast,flux-standard, orflux-reasoning. - Want one exact model every time? Pin it with a
flux-pinned-*id. See Models.
Pricing lanes
Models are priced in three lanes. You pay per token at the lane rate of whichever model served the request. All rates are pay-as-you-go.
| Lane | Starts at | For |
|---|---|---|
| Express Lane | $1 / 1M tokens | Lightweight, high-volume, latency-sensitive work |
| Daily Driver | $2 / 1M tokens | General-purpose coding, writing, and analysis |
| Deep Thought | $4 / 1M tokens | The hardest reasoning and frontier-model work |
"Starts at" is the floor for that lane; individual models within a lane bill at their own published rate. The live rate for any model is what you are charged.
Pay-as-you-go
You pay for what you route. There is no per-seat fee on usage and no minimum commitment to send a request. Plans set an included credit and a monthly spend ceiling; pay-as-you-go bills usage directly. Because flux-auto sends cheap requests to cheap models, a typical mixed workload costs less than pinning everything to a single frontier model.
See the pricing page for current plans and ceilings.
Transparency headers
Every response carries X-Flux-* headers so you can see exactly what happened. These are returned on chat completion responses:
| Header | Value | Meaning |
|---|---|---|
X-Flux-Model | e.g. claude-haiku | The model that actually served the request |
X-Flux-Original-Model | e.g. flux-auto | The model id you requested |
X-Flux-Routed | true / false | Whether the router changed the model |
X-Flux-Request-Id | a unique id | Identifier for support and debugging |
X-Flux-Cost-Usd | e.g. 0.000412 | What this request cost, in USD (non-streaming responses) |
Read these to confirm which model answered, see what it cost, and keep a request id for support. Inspect them with curl:
curl -i https://api.fluxrouter.ai/v1/chat/completions \
-H "Authorization: Bearer $FLUX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "flux-auto",
"messages": [{ "role": "user", "content": "ping" }]
}'
# Response headers include:
# X-Flux-Model: ...
# X-Flux-Original-Model: flux-auto
# X-Flux-Routed: true
# X-Flux-Request-Id: ...
# X-Flux-Cost-Usd: 0.000412
The cost header is present on non-streaming responses. On streaming responses the final cost is only known after the headers have flushed, so it is reported on your bill rather than in the header.
Cost vs going direct
The point of routing is that you do not overpay for easy requests. With flux-auto, simple prompts land on Express Lane and Daily Driver models and only the hard ones reach Deep Thought, so a mixed workload pays a blended rate instead of a frontier rate on everything. You see the model and lane on every response via the headers above, and your spend rolls up to a single bill.