You wired up a model. It worked. You moved on. Totally the right call.
It's also why you're overpaying right now. The model didn't get worse. The deal did. Somewhere along the way a cheaper or smarter option shipped, and your code slept through it, because code doesn't read changelogs.
And here's the annoying part. This isn't a one-and-done you patch and forget. The price-quality frontier moves more like monthly than yearly, and every time it shifts, your hardcoded default falls a bit further behind.
The frontier moves while you're not looking
Two real moments, a day apart.
August 7, 2024. OpenAI shipped a new GPT-4o snapshot and halved input token prices. Five dollars per million input tokens dropped to $2.50. Output went from $15 to $10. Same family, basically the same quality that week, half the input bill. Pinned the May version? You kept paying the old rate until somebody went back and changed a string.
Next day. August 8, 2024, Google dropped the price of Gemini 1.5 Flash by about 79%. Thirty-five cents per million input tokens down to 7.5 cents, output at 30. So inside 48 hours, two of the biggest providers repriced their workhorse models, by wildly different amounts, and not one of them called to let you know.
And those are the ones big enough to make headlines. The quieter cuts, the new entrant undercutting everyone on a random Monday, the model that's suddenly twice as good for the same money. None of that lands in your inbox.
Your default is a bet you stopped watching
You hardcode gpt-4o or gemini-1.5-pro or whatever won the day you shipped. That's a frozen decision. It stays correct exactly as long as the market holds still. The market does not hold still.
The failure is quiet, and quiet is what makes it expensive. Nothing breaks. No alert fires. Your bill just runs 30% or 50% over what it should, month after month, against a number you never recompute. A bad deal that throws errors gets fixed by lunch. A bad deal that works perfectly can run for a year before anyone blinks.
The inertia is real too. Who wants to be the person who swaps the model in the checkout flow because some blog post said a new one is cheaper? Now you're re-testing, re-validating, re-prompting, maybe re-tuning. So you leave it. Perfectly rational. And the default keeps aging while the gap compounds.
Now do that everywhere you call a model. The summarizer. The classifier. The chat endpoint. That batch job nobody's owned since the person who wrote it left. Every one of them is its own frozen bet, placed on a different day, against a frontier that's moved several times since. A dozen stale defaults at a dozen different vintages, and you couldn't list them all if I asked.
Picking better just resets the same clock
So you go shopping. Read the new benchmarks, crown the new winner, hardcode that one. Congratulations, you just bought the exact same problem on a six-month timer. Your shiny new pick is somebody's stale default by Christmas.
The real move is to stop deciding model choice once and welding it into a config file. Decide it per request, against today's prices and today's options, and let something else eat the churn so your code never has to care who repriced last Tuesday.
That's the job we built Flux for. One API key, 30-plus models behind it, routing that picks a sensible model per request instead of chaining you to the one that happened to be smart the day you typed its name. The frontier moves, the routing moves, and tracking it stops being your problem.
So the model you picked six months ago was never really the problem. The problem is nobody set a reminder to look again. And nobody ever will, because checking is not a thing humans remember to do.
