On August 31, 2025, 16% of requests to Claude Sonnet 4 came back wrong.
Not slow. Not rate-limited. Wrong. A context-window routing error was shoving some requests at servers set up for the wrong thing, and the model just quietly got worse. Anthropic later wrote up a careful postmortem on it. Three separate infrastructure bugs, overlapping, dragging output quality down for weeks. The worst single hour is where that 16% figure on Sonnet 4 comes from. A second bug corrupted output on Opus 4.1 and Opus 4 between August 25 and 28, and on Sonnet 4 from August 25 through September 2.
Now, credit where it's due. That postmortem is one of the most honest pieces of engineering writing a frontier lab has put out, full stop. They explained exactly what broke. They said plainly that they never throttle quality to save money. They listed the fixes. I'm not having a go at Anthropic here.
If anything it's the reverse. The most transparent, most careful lab in the game shipped two weeks of degraded output across three of its models. So what do you reckon is happening at the labs that don't bother writing postmortems?
You picked one. Now you ride its bad days.
Wire your app to a single provider and you inherit the lot. Their uptime, their rate limits, fine, you knew about those. But also their deprecation calendar, which you absolutely did not sign up for. You signed up for "good model, fair price." The bet hiding underneath is "and I'll be grand every single day this provider is."
Some days it isn't.
November 18, 2025: Cloudflare had an outage that kicked off at 11:20 UTC and wasn't fully sorted until 17:06 UTC. Call it six hours. A database permissions change blew up an internal config file, and a big chunk of the web started throwing 500s. ChatGPT and Claude both went dark in the window, because they live behind the same front door.
Two weeks after that, December 2, 2025, ChatGPT went down again, different reason this time. OpenAI called it a routing misconfiguration. Roughly 3,000 people filed reports on Downdetector before it cleared.
And here's the thing nobody wants to hear: none of this was incompetence. A config file at Cloudflare doubled in size and tripped a hard limit. At OpenAI a routing rule pointed at the wrong place. That's just what running systems at this scale looks like on a normal week. The failures are normal. The weird part, the part you actually chose, is deciding to eat every one of them with nothing to fall back on.
The slow version is worse than the fast version
An outage is loud. Pager goes off, you see the 500s, you know within a minute. Honestly? That's the easy one.
The quiet killer is the deprecation. The model or API you built your whole thing on gets a sunset date, and suddenly there's a migration on your calendar that you never asked to be there.
OpenAI told developers on August 26, 2025 that the Assistants API was deprecated and would be removed exactly a year later, August 26, 2026. Built on Assistants? Your clock started ticking that day. That's a real API change with a hard deadline, not a polite heads-up you can ignore. Consumers get a gentler version of the same medicine. On February 13, 2026, OpenAI retired GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini from the ChatGPT app, because basically everyone had moved on anyway. The API itself stayed put that day. The Assistants clock didn't.
From their side a deprecation is totally reasonable. From yours it's a forced re-test, a re-tune of prompts you wrote around one specific model's quirks, and a nervy little window where you're just hoping the replacement behaves close enough to ship. You didn't ask for any of that work. Their roadmap dropped it on your desk and walked off.
What single-provider really means
Your reliability is capped at theirs. That's the whole deal, the entire thing, right there. You can write flawless code on top of a provider having a rough week and still hand your users a rough week.
Watch how it rhymes. Teams routing everything through one Anthropic model ate the full degradation in late August. Apps pinned to one front-end provider went dark for an afternoon in November. The shops that built only on Assistants now own a migration deadline they never picked. Same shape every time, and the bill always lands on whoever had no second path.
The fix is not "trust a different provider more." Different provider, same trap. The fix is not betting your whole load on any single one of them. If a prompt can be served by four capable models across three providers, then a bad hour at one of them stops being an outage and becomes a routing decision. The request just goes somewhere healthy. That's the thing we built Flux to do, so "which provider is having a bad day" quietly stops being your problem to babysit.
You're going to have outages. Everyone does. The only thing you actually get to decide is whether one company's bad Tuesday gets to be yours as well.
