Truncated or empty output

Why FluxRouter responses stop early or come back blank, and how to fix it: max_tokens, streaming handling, and client cut-offs.

A response that stops mid-sentence or comes back empty is almost always a client-side limit or a handling bug, not a routing problem. Work through these in order.

Why does my response stop mid-sentence?

Symptom: The model output is cut off before it finishes.

Cause: max_tokens is too low. The model hits the output limit and stops.

Fix:

  • Raise max_tokens to give the response room to finish. On the OpenAI chat path this is max_tokens (or max_completion_tokens); on the Anthropic path max_tokens is required.

  • Check the response finish_reason. A value of length confirms the output was truncated by the token limit, not by the model finishing naturally (stop).

    json
    {
      "choices": [
        { "finish_reason": "length", "message": { "role": "assistant", "content": "..." } }
      ]
    }
    
  • If you are asking for long output (code files, long documents), set max_tokens generously.

Why is my streamed response incomplete?

Symptom: Streaming output looks short or drops the end of the message.

Cause: The stream is not being fully consumed, or chunks are being dropped before the [DONE] event.

Fix:

  • Read the stream to completion. Do not break out of the loop early, and wait for the terminating [DONE] event (OpenAI path) or the message_stop event (Anthropic path).
  • Accumulate delta content across all chunks rather than reading only the first or last.
  • Make sure your HTTP client is not buffering with a size cap that truncates the stream.

Why is the response completely empty?

Symptom: The request succeeds (200) but content is blank.

Cause: Usually one of: max_tokens set so low the model produced nothing usable, a reasoning-heavy model spending its budget before visible output, or your code reading the wrong field.

Fix:

  • Read the right field. OpenAI chat: choices[0].message.content. OpenAI Responses: the output array. Anthropic: content[0].text.
  • Raise max_tokens if it was set very low (for example under 64).
  • Check finish_reason. A length finish with empty visible content means the budget was consumed before output was emitted; raise max_tokens.
  • Confirm the request actually has a user message with non-empty content.

Why does it cut off only in my app, but work in curl?

Symptom: The same prompt returns full output in curl but truncates in your application.

Cause: A client-side read timeout or a response size cap in your HTTP layer is cutting the connection before the full body arrives.

Fix:

  • Increase your client read timeout. Long generations can take longer than a default 30s timeout.
  • Prefer streaming for long responses so you receive output incrementally instead of waiting for one large body.
  • See Latency and timeouts for client timeout settings.