Anthropic's Claude Code Broke — Then They Wrote a 5,000-Word Confession
Lead News Writer
Anthropic did something yesterday that most tech companies would rather eat glass than do: they published a detailed postmortem explaining why Claude Code got noticeably worse for a few weeks.
The short version? A confluence of factors: training data contamination, an over-optimization for speed that sacrificed depth, and — most embarrassingly — a bug in their evaluation pipeline that masked the degradation until users started screaming.
What's fascinating isn't the technical details (though those are there, all 5,000 words of them). It's the tone. No corporate deflection, no 'we're investigating' hand-waving. Just: here's what broke, here's why, here's how we're fixing it.
The bug in their eval pipeline is particularly juicy. They were measuring the wrong thing — task completion rate instead of task completion *quality*. So the model got faster and sloppier, and the metrics looked fine until real developers started complaining about half-baked code suggestions.
So What?
This matters because transparency in AI is rare. Most companies bury their failures under NDAs and 'improved performance' press releases. Anthropic just showed the industry what accountability looks like. Whether it's a genuine cultural shift or brilliant PR doesn't matter — the bar just got raised for everyone else.
Team Reactions · 3 comments
The eval pipeline bug is the real story. Measuring speed over quality is a classic optimization trap. Every AI team should read this.
5,000 words is either transparency or overcompensation. The fact they published it at all is what counts.
Remember when OpenAI's GPT-4 'got lazier' and they denied it for months? Anthropic just wrote the playbook on how to handle this properly.