Use Flux for a coding agent

Point a coding agent at FluxRouter so flux-auto right-sizes every call. You stop paying frontier rates for routine tool calls and keep strong models for the hard steps.

A coding agent does not make one big call. It makes many: read a file, decide the next step, call a tool, summarize the result, plan, edit, repeat. Most of those calls are routine. A few are genuinely hard. FluxRouter is OpenAI-compatible, so any agent that lets you set an OpenAI base URL works without code changes: set the base URL to https://api.fluxrouter.ai/v1, use your Flux key (sk-...), and set the model to flux-auto.

Why flux-auto fits an agent

The cost problem with agents is that the obvious setup pins every call to one frontier model, so you pay the top rate for "list the files in this directory" and for "design a migration plan" alike. Most agent calls are the cheap kind.

With flux-auto, FluxRouter right-sizes each call: the routine tool calls and short decisions go to cheap, fast models, and the hard reasoning steps go to a stronger one. Across a full agent run, which is mostly routine calls, that means a blended rate instead of a frontier rate on everything. You do not have to classify each call yourself.

Connect a coding tool

If you are using an existing agent or editor (Claude Code, Codex, Cursor, Cline, Aider, Continue, and others), you usually do not touch code at all. You point the tool's OpenAI-compatible base URL at FluxRouter and paste your Flux key. See Connect your tools for the exact setting per tool.

Build your own agent loop

If you are writing the agent yourself, it is the standard OpenAI chat-completions loop with tools. Only the base URL, key, and model differ from a direct OpenAI setup.

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-...",                          # your Flux key
    base_url="https://api.fluxrouter.ai/v1",   # the one line you change
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a file from the repo.",
            "parameters": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"],
            },
        },
    }
]

messages = [{"role": "user", "content": "Find and fix the failing test."}]

while True:
    response = client.chat.completions.create(
        model="flux-auto",
        messages=messages,
        tools=tools,
    )
    msg = response.choices[0].message
    messages.append(msg)

    if not msg.tool_calls:
        print(msg.content)
        break

    for call in msg.tool_calls:
        result = run_tool(call.function.name, call.function.arguments)  # your code
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": result,
        })

flux-auto right-sizes each iteration of this loop. For the full tool-calling request and response shape, see Tool and function calling.

Pin a strong model for the hard steps

If a specific step needs a known frontier model every time (for example, the final plan or a tricky refactor), you do not have to send the whole run there. Pin just that call to a strong model with a flux-pinned-* id or a tier alias, and leave the rest of the loop on flux-auto:

python
# Routine steps: let routing right-size them
response = client.chat.completions.create(model="flux-auto", messages=messages, tools=tools)

# A hard planning step: pin a strong model just for this call
plan = client.chat.completions.create(model="flux-reasoning", messages=plan_messages)

This keeps the cheap calls cheap and gives you determinism exactly where you want it. See Models for the available aliases and flux-pinned-* ids.

Practical tips

  • Retry on 429. Long agent runs make many calls; back off and retry rather than aborting the run.
  • Watch your spend ceiling. Agents can fan out into many calls fast. Keep an eye on usage so a runaway loop does not burn through your budget. See Routing and pricing.
  • Log which model answered. The X-Flux-Model and X-Flux-Routed response headers show what served each call, which is useful when debugging an agent's behavior.

Next steps