Transcribe audio

Convert speech to text with the flux-voice model through FluxRouter.

FluxRouter exposes speech-to-text through the flux-voice model alias. You send an audio file and get a transcript back, using the same API key and base URL (https://api.fluxrouter.ai/v1) as your text requests. The request is metered and billed like any other Flux call.

This page covers transcribing an audio file. It uses the OpenAI-compatible /v1/audio/transcriptions endpoint, so if you already call OpenAI-style transcription, pointing at FluxRouter is a base-URL-and-key change.

The model id

Use flux-voice as the model. Where the text tier aliases (flux-fast, flux-standard, flux-reasoning) route to text models, flux-voice routes to a speech-to-text model behind the same key and endpoint.

flux-voice automatically balances speed and accuracy per clip. Two explicit variants are available when you want to pin one:

  • flux-voice — the default. Picks the right engine for each clip automatically. Omit model entirely and you get this.
  • flux-voice-accurate — favor transcription accuracy.
  • flux-voice-fast — favor latency.

You can confirm the alias is live by listing models:

bash
curl https://api.fluxrouter.ai/v1/models \
  -H "Authorization: Bearer $FLUX_API_KEY"

GET /v1/models returns every flux-* id you can use. See Models for the catalog.

Transcribe a file

Because FluxRouter is OpenAI-compatible, transcription uses the OpenAI audio request shape: a model and a file, sent as multipart/form-data. The OpenAI SDK sends this through its audio.transcriptions.create method.

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-...",                          # your Flux key
    base_url="https://api.fluxrouter.ai/v1",   # the one line you change
)

with open("meeting.m4a", "rb") as f:
    result = client.audio.transcriptions.create(
        model="flux-voice",
        file=f,
    )

print(result.text)

curl

bash
curl https://api.fluxrouter.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $FLUX_API_KEY" \
  -F "file=@meeting.m4a" \
  -F "model=flux-voice"

Request fields

file is required; everything else is optional.

FieldNotes
fileThe audio. Accepts wav, mp3, mp4/m4a, mpeg/mpga, ogg, webm, and flac. Up to 8 MB.
modelflux-voice (default), flux-voice-accurate, or flux-voice-fast.
languageISO-639-1 code (for example en). Omit to auto-detect.
promptA short hint of names, jargon, or spellings to improve accuracy.
response_formatjson (default), verbose_json, or text.
timestamp_granularities[]word and/or segment. Request word for word-level timestamps.
temperatureSampling temperature.

Streaming (stream: true) and the subtitle formats (srt, vtt) are not supported and return a 400.

Response formats

  • json (default) — { "text": "..." }.
  • text — the transcript as a plain string.
  • verbose_jsontext plus language, duration, and segments. Add timestamp_granularities[]=word to also get per-word timestamps in words — useful for cursor positioning and word-level editing in dictation UIs.
bash
curl https://api.fluxrouter.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $FLUX_API_KEY" \
  -F "file=@note.ogg" \
  -F "model=flux-voice" \
  -F "response_format=verbose_json" \
  -F "timestamp_granularities[]=word"

Every successful response also carries transparency headers: x-flux-routed-model (the variant that served), x-flux-billed-seconds, and x-flux-cost-usd.

Audio transcription is available on paid plans. A key without a paid plan receives a 402 with code premium_locked and no transcription. If you are building a client, handle that status distinctly — prompt the user to upgrade rather than treating it as a generic error. (A 401 means the key itself is missing or invalid; a 402 means a valid key without a paid plan.)

Limits

  • File size: up to 8 MB per request. Larger uploads return 413.
  • Rate: transcription is rate-limited per account. A burst over the limit returns 429; back off and retry.

Billing

Transcription is billed at $0.10 per audio-minute, charged per second of audio with a 10-second minimum per request, and rolls up to the same single bill as the rest of your Flux usage. The price is the same whichever variant serves. See Routing and pricing for how billing works.

Notes

  • Use flux-voice for transcription; do not send audio to the text aliases.
  • The default flux-voice never transcribes worse than the accurate variant — it only chooses when to favor speed. Pin flux-voice-accurate or flux-voice-fast only when you have a specific reason.
  • For recorded-in-browser audio, ogg/opus is a good container choice and keeps the automatic speed/accuracy selection working.