Opus 4.8 Would Rather Tell You It Failed
Anthropic's fourth Opus model in six months. The story isn't another point on SWE-bench, it's a model that flags its own broken code and a tool that lets one agent command hundreds. An honest scorecard from someone who runs agents all day.