Transcribe audio

Convert speech to text with the flux-voice model through FluxRouter.

FluxRouter exposes speech-to-text through the flux-voice model alias. You send an audio file and get a transcript back, using the same API key and base URL (https://api.fluxrouter.ai/v1) as your text requests. The request is metered and billed like any other Flux call.

This page covers transcribing an audio file. It uses the OpenAI-compatible /v1/audio/transcriptions endpoint, so if you already call OpenAI-style transcription, pointing at FluxRouter is a base-URL-and-key change.

The model id

Use flux-voice as the model. Where the text tier aliases (flux-fast, flux-standard, flux-reasoning) route to text models, flux-voice routes to a speech-to-text model behind the same key and endpoint.

flux-voice automatically balances speed and accuracy per clip. Two explicit variants are available when you want to pin one:

flux-voice — the default. Picks the right engine for each clip automatically. Omit model entirely and you get this.
flux-voice-accurate — favor transcription accuracy.
flux-voice-fast — favor latency.

You can confirm the alias is live by listing models:

bash

curl https://api.fluxrouter.ai/v1/models \
  -H "Authorization: Bearer $FLUX_API_KEY"

GET /v1/models returns every flux-* id you can use. See Models for the catalog.

Transcribe a file

Because FluxRouter is OpenAI-compatible, transcription uses the OpenAI audio request shape: a model and a file, sent as multipart/form-data. The OpenAI SDK sends this through its audio.transcriptions.create method.

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",                          # your Flux key
    base_url="https://api.fluxrouter.ai/v1",   # the one line you change
)

with open("meeting.m4a", "rb") as f:
    result = client.audio.transcriptions.create(
        model="flux-voice",
        file=f,
    )

print(result.text)

curl

bash

curl https://api.fluxrouter.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $FLUX_API_KEY" \
  -F "file=@meeting.m4a" \
  -F "model=flux-voice"

Request fields

file is required; everything else is optional.

Field	Notes
`file`	The audio. Accepts wav, mp3, mp4/m4a, mpeg/mpga, ogg, webm, and flac. Up to 8 MB.
`model`	`flux-voice` (default), `flux-voice-accurate`, or `flux-voice-fast`.
`language`	ISO-639-1 code (for example `en`). Omit to auto-detect.
`prompt`	A short hint of names, jargon, or spellings to improve accuracy.
`response_format`	`json` (default), `verbose_json`, or `text`.
`timestamp_granularities[]`	`word` and/or `segment`. Request `word` for word-level timestamps.
`temperature`	Sampling temperature.

Streaming (stream: true) and the subtitle formats (srt, vtt) are not supported and return a 400.

Response formats

json (default) — { "text": "..." }.
text — the transcript as a plain string.
verbose_json — text plus language, duration, and segments. Add timestamp_granularities[]=word to also get per-word timestamps in words — useful for cursor positioning and word-level editing in dictation UIs.

bash

curl https://api.fluxrouter.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $FLUX_API_KEY" \
  -F "file=@note.ogg" \
  -F "model=flux-voice" \
  -F "response_format=verbose_json" \
  -F "timestamp_granularities[]=word"

Every successful response also carries transparency headers: x-flux-routed-model (the variant that served), x-flux-billed-seconds, and x-flux-cost-usd.

Paid plan required

Audio transcription is available on paid plans. A key without a paid plan receives a 402 with code premium_locked and no transcription. If you are building a client, handle that status distinctly — prompt the user to upgrade rather than treating it as a generic error. (A 401 means the key itself is missing or invalid; a 402 means a valid key without a paid plan.)

Limits

File size: up to 8 MB per request. Larger uploads return 413.
Rate: transcription is rate-limited per account. A burst over the limit returns 429; back off and retry.

Billing

Transcription is billed at $0.10 per audio-minute, charged per second of audio with a 10-second minimum per request, and rolls up to the same single bill as the rest of your Flux usage. The price is the same whichever variant serves. See Routing and pricing for how billing works.

Notes

Use flux-voice for transcription; do not send audio to the text aliases.
The default flux-voice never transcribes worse than the accurate variant — it only chooses when to favor speed. Pin flux-voice-accurate or flux-voice-fast only when you have a specific reason.
For recorded-in-browser audio, ogg/opus is a good container choice and keeps the automatic speed/accuracy selection working.

Transcribe audio

Convert speech to text with the flux-voice model through FluxRouter.

The model id#

Transcribe a file#

curl#

Request fields#

Response formats#

Paid plan required#

Limits#

Billing#

Notes#

The model id

Transcribe a file

curl

Request fields

Response formats

Paid plan required

Limits

Billing

Notes