Build a chatbot on Flux

Point an OpenAI-compatible chat loop at FluxRouter with flux-auto, stream tokens for a responsive UI, and let routing right-size each turn.

A chatbot is a loop: you keep a list of messages, append the user's turn, send the whole list to the model, and append the reply. FluxRouter is OpenAI-compatible, so you build this exactly the way you would against OpenAI. The only Flux-specific parts are the base URL (https://api.fluxrouter.ai/v1), your Flux key (sk-...), and the model (flux-auto).

Why flux-auto fits a chatbot

The turns in a chat are not equally hard. "Hi", "thanks", and "what time is it" are cheap. "Explain this stack trace" or "rewrite this function to be O(n)" are not. With flux-auto, FluxRouter right-sizes each turn: simple turns go to a cheap, fast model and hard turns go to a stronger one. So a real conversation, which is a mix of both, pays a blended rate instead of a frontier rate on every message. You do not have to detect difficulty yourself or maintain your own routing rules.

If you need one consistent model for every turn (for example, to fix a tone or a behavior), pin it with a flux-pinned-* id instead of flux-auto. See Models.

Minimal chat loop

This is a complete terminal chatbot. It keeps the message history, streams each reply token by token, and adds the assistant's reply back to the history so the model has context on the next turn.

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-...",                          # your Flux key
    base_url="https://api.fluxrouter.ai/v1",   # the one line you change
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
]

while True:
    user_input = input("you: ")
    if user_input.strip() in {"exit", "quit"}:
        break

    messages.append({"role": "user", "content": user_input})

    stream = client.chat.completions.create(
        model="flux-auto",
        messages=messages,
        stream=True,
    )

    print("bot: ", end="", flush=True)
    reply = ""
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)
        reply += delta
    print()

    messages.append({"role": "assistant", "content": reply})

The same loop in TypeScript:

ts
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FLUX_API_KEY,            // your Flux key
  baseURL: "https://api.fluxrouter.ai/v1",     // the one line you change
});

const messages = [
  { role: "system", content: "You are a helpful assistant." },
];

messages.push({ role: "user", content: userInput });

const stream = await client.chat.completions.create({
  model: "flux-auto",
  messages,
  stream: true,
});

let reply = "";
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(delta);
  reply += delta;
}

messages.push({ role: "assistant", content: reply });

Stream for a responsive UI

For anything user-facing, set stream: true. Streaming sends tokens as they are generated instead of waiting for the full reply, so the user sees text appear immediately. This is the single biggest perceived-latency win for a chatbot, and because flux-auto sends simple turns to fast models, those turns also finish quickly. For the full streaming setup in both formats, including server-sent events for a web app, see Streaming responses.

Keep the conversation in context

The model has no memory between requests. You provide the memory by sending the full messages array every turn, including past user and assistant messages. Append each reply to the array as shown above. If conversations get long, trim or summarize older turns to stay within the model's context window and to control token cost, since you pay for the input tokens you send each turn.

Practical tips

  • Set a system message to fix the bot's role and tone. It is sent on every turn, so keep it concise.
  • Show which model answered. Every response carries X-Flux-Model and X-Flux-Routed headers, so you can display or log which model served a turn. See Routing and pricing.
  • Handle rate limits. On a 429, back off and retry rather than dropping the turn.

Next steps