Truncated or empty output
Why FluxRouter responses stop early or come back blank, and how to fix it: max_tokens, streaming handling, and client cut-offs.
A response that stops mid-sentence or comes back empty is almost always a client-side limit or a handling bug, not a routing problem. Work through these in order.
Why does my response stop mid-sentence?
Symptom: The model output is cut off before it finishes.
Cause: max_tokens is too low. The model hits the output limit and stops.
Fix:
Raise
max_tokensto give the response room to finish. On the OpenAI chat path this ismax_tokens(ormax_completion_tokens); on the Anthropic pathmax_tokensis required.Check the response
finish_reason. A value oflengthconfirms the output was truncated by the token limit, not by the model finishing naturally (stop).{ "choices": [ { "finish_reason": "length", "message": { "role": "assistant", "content": "..." } } ] }If you are asking for long output (code files, long documents), set
max_tokensgenerously.
Why is my streamed response incomplete?
Symptom: Streaming output looks short or drops the end of the message.
Cause: The stream is not being fully consumed, or chunks are being dropped before the [DONE] event.
Fix:
- Read the stream to completion. Do not break out of the loop early, and wait for the terminating
[DONE]event (OpenAI path) or themessage_stopevent (Anthropic path). - Accumulate
deltacontent across all chunks rather than reading only the first or last. - Make sure your HTTP client is not buffering with a size cap that truncates the stream.
Why is the response completely empty?
Symptom: The request succeeds (200) but content is blank.
Cause: Usually one of: max_tokens set so low the model produced nothing usable, a reasoning-heavy model spending its budget before visible output, or your code reading the wrong field.
Fix:
- Read the right field. OpenAI chat:
choices[0].message.content. OpenAI Responses: theoutputarray. Anthropic:content[0].text. - Raise
max_tokensif it was set very low (for example under 64). - Check
finish_reason. Alengthfinish with empty visible content means the budget was consumed before output was emitted; raisemax_tokens. - Confirm the request actually has a user message with non-empty content.
Why does it cut off only in my app, but work in curl?
Symptom: The same prompt returns full output in curl but truncates in your application.
Cause: A client-side read timeout or a response size cap in your HTTP layer is cutting the connection before the full body arrives.
Fix:
- Increase your client read timeout. Long generations can take longer than a default 30s timeout.
- Prefer streaming for long responses so you receive output incrementally instead of waiting for one large body.
- See Latency and timeouts for client timeout settings.