Vision and image inputs

Send images to the model using the standard OpenAI multimodal message shape with image URLs or base64 data.

Vision means sending an image to the model alongside text so it can describe, read, or reason about the picture. FluxRouter accepts the standard OpenAI multimodal message shape against https://api.fluxrouter.ai/v1/chat/completions, so you pass images exactly as you would calling OpenAI. The only Flux-specific part is the base URL and the flux-auto model.

The multimodal message shape

Instead of a plain string, the content of a user message is an array of parts. Each part is either text or an image. An image part uses type: "image_url".

json
{
  "role": "user",
  "content": [
    { "type": "text", "text": "What is in this image?" },
    { "type": "image_url", "image_url": { "url": "https://example.com/cat.jpg" } }
  ]
}

Send an image by URL

The simplest case: point the model at a public image URL.

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-...",                          # your Flux key
    base_url="https://api.fluxrouter.ai/v1",   # the one line you change
)

response = client.chat.completions.create(
    model="flux-auto",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in one sentence."},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/cat.jpg"},
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

Send an image as base64

For local images, encode the file and pass it as a data: URL. This works the same way; only the url value changes.

python
import base64
from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://api.fluxrouter.ai/v1",
)

with open("diagram.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="flux-auto",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What does this diagram show?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{b64}"},
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

curl (image URL)

bash
curl https://api.fluxrouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $FLUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "flux-auto",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What is in this image?" },
          { "type": "image_url", "image_url": { "url": "https://example.com/cat.jpg" } }
        ]
      }
    ]
  }'

Make sure your model can see images

Vision is a capability of the underlying model, not of FluxRouter. flux-auto routes to a sensible model for the request, and when your message includes an image the router aims for a vision-capable model, but image support ultimately depends on the model that serves the request.

If your application always sends images, pin a vision-capable model so every request lands on one that supports image inputs. Pass a flux-pinned-* id (for example flux-pinned-claude-sonnet, flux-pinned-gpt-5, or flux-pinned-gemini-3-1-pro) instead of flux-auto. See Models for the full list.

Notes

  • This page covers image inputs (the model reads an image you send). To create images from a text prompt, see Generate images.
  • The Anthropic-compatible base at https://api.fluxrouter.ai/anthropic also accepts images, using Anthropic's native image content blocks (a source with type: "base64" or type: "url"), exactly as you would calling Anthropic directly.
  • Multiple images per message are allowed; add more image_url parts to the same content array.