Ask any team what their last AI feature actually cost to run. Not the monthly invoice. The feature itself. The one that sorts support tickets, or the chatbot that answers your customers. Cost per call. Which model handled it. How much of that spend was just retries of stuff that failed the first time around.
Most of them can't tell you. And it isn't because they're sloppy. The tooling to answer simply doesn't live where they work, so the invoice shows up as one fat number long after the money's gone.
This isn't a hunch. The FinOps Foundation runs the biggest survey of cloud cost practitioners going, every year. In the 2026 report (1,192 practitioners, representing more than $83 billion in annual cloud spend), 98% said they now manage AI spend. Year before, that was 63%. So the budgets are real, and they're ballooning. Then those same people got asked an open question: name the one tool capability you want and don't have. Top answer? Granular monitoring of AI spend down to tokens and individual LLM requests. The thing they want to watch most is the thing none of them can see.
Read that again. The folks whose entire job is watching cloud costs are flat-out telling you, in their own words, that per-request AI cost is the hole.
Where the money actually leaks
Your monthly bill says you spent $9,000 on an LLM provider. Fine. What it doesn't say is that 12% of those calls were retries of requests that already failed once. Or that a summarizer kept getting asked for JSON, kept wrapping it in markdown the parser choked on, so the same call ran twice for nothing. Somewhere in there a single classifier was firing 800-plus times a week when a cache would have killed most of it. Developers who went digging found exactly this. Hundreds of dollars a month each, buried inside a number that looked perfectly reasonable in aggregate.
You can't cut what you can't see. And most teams right now are flying on instruments that report altitude once a month.
It's worse at the company level
CloudZero surveyed more than 500 software professionals, manager level and up, for its State of AI Costs 2025 report. Two numbers stuck with me. Only 51% strongly agree they can track the ROI of their AI spend effectively. And 57% were still tracking AI costs in spreadsheets.
Spreadsheets. For a cost that moves per request, in real time, across a handful of model providers at once. By the time somebody types the row in, the money's already history.
Now stack those two together. Half these leaders can't say with any confidence whether the AI spend is paying off. Well over half are tracking it by hand in a tool that went obsolete for this job a decade ago. That's not a discipline problem. The instrument in their hand literally can't read the thing they're pointing it at.
Why AI is uniquely bad at this
Normal cloud spend is at least readable. A server runs an hour, you pay for an hour. Done. AI spend is per token, per call, per model, and the price of any single answer rides on how long the prompt was, how long the answer ran, whether it hit a cache, whether the model retried itself, and which model grabbed the request in the first place. Now multiply all that across a few providers. Each one has its own dashboard. Its own billing format. Its own idea of what counts as a token. The per-request picture shatters into pieces, and nobody on earth is sitting there reassembling them.
So "what did that one answer cost" becomes a little forensic project. And nobody runs it. They glance at the total, wince, move on.
The fix was never going to be a smarter spreadsheet. It's making the cost show up the moment the answer comes back, stapled to the request, with the model that handled it named right there in front of you. No digging through the rubble afterward.
That's a big part of why Flux puts the model that answered and the cost of that answer in the response headers on every request, and bills it on one invoice instead of five. Not because "transparency" reads nicely on a slide. Because you cannot manage a number you only meet once a month, after it's spent, smeared into a total that hides every place it went sideways.
The people who watch cloud costs for a living already pointed at the gap and named it. Tokens and LLM requests. Everyone else is still squinting at a monthly total, doing careful arithmetic on numbers that went dark the second the request left the building.
