Claude Opus 4.6
The Biggest Upgrade Since... Well, Since Opus 4.5
Anthropic dropped Opus 4.6 yesterday (February 5, 2026), and it's a significant leap. I've been using it since it went live, and here's everything that matters.
TL;DR
One-million-token context window. 128k output tokens. Agent teams in Claude Code. Adaptive thinking. PowerPoint integration. And it beats GPT-5.2 on most benchmarks. Same price.
What Actually Changed
1M Token Context Window
This is the headline feature. Opus went from 200k to 1 million tokens — roughly 750,000 words in a single session. That's an entire codebase, a full legal document set, or a year's worth of meeting notes loaded at once.
More importantly, they've mostly solved the "context rot" problem. Previous models would degrade as conversations got longer — the model would start forgetting things buried deep in the context. Opus 4.6 scores 76% on the MRCR v2 benchmark (a needle-in-a-haystack test), compared to just 18.5% for Sonnet 4.5. That's not an incremental improvement. That's a qualitative shift.
128k Output Tokens
You can now get up to 128k tokens in a single response. If you've ever hit that frustrating wall where Claude cuts off mid-code-generation and you have to ask it to continue, this should help significantly.
Adaptive Thinking
Previously, extended thinking was a binary toggle — on or off. Now the model can decide how much to think based on the task complexity. If you ask it something simple, it answers quickly. If you throw a gnarly debugging problem at it, it takes its time.
There are four effort levels: low, medium, high (default), and max. The /effort parameter gives you fine-grained control. Anthropic themselves note that Opus 4.6 sometimes "overthinks" simpler tasks, so dialing down to medium for routine work is a good idea.
Agent Teams in Claude Code
This one is massive for developers. Instead of a single Claude agent working through tasks sequentially, you can now spin up multiple agents that work in parallel and coordinate autonomously. Think of it as having a small engineering team instead of a single developer.
Best use case: tasks that split into independent, read-heavy work like codebase reviews, large refactors, or running analyses across multiple files simultaneously.
You can even take over any subagent directly using Shift+Up/Down or tmux. It's in research preview via the API right now.
Context Compaction
For long-running agentic tasks, Claude can now automatically summarize and replace older context when approaching the window limit. This means your agents can run for much longer without crashing into context limits. Finally.
Office Integrations
Claude in PowerPoint
New research preview. Claude now works directly inside PowerPoint as a side panel. It reads your existing layouts, fonts, and slide masters to stay on-brand. No more "create a deck in Claude → download → fix formatting in PowerPoint" workflow. You can now build and iterate directly.
Available for Max, Team, and Enterprise subscribers.
Claude in Excel (Upgraded)
The Excel integration got smarter. It can now ingest messy, unstructured data and figure out the right structure without you having to explain it. Multi-step changes in one pass. It plans before it acts.
Benchmarks
For the numbers people:
- Terminal-Bench 2.0 (agentic coding): 65.4% — highest score ever recorded
- GDPval-AA (knowledge work in finance/legal): 1606 Elo — 144 points ahead of GPT-5.2
- Humanity's Last Exam (multidisciplinary reasoning): leads all frontier models
- BrowseComp (finding hard-to-find info online): best in industry
- OSWorld (computer use): 72.7%, up from 66.3%
- MRCR v2 (long-context retrieval): 76% vs 18.5% for Sonnet 4.5
The one area where it didn't improve: SWE-bench Verified and MCP Atlas showed small regressions. Bit of an anomaly given everything else.
Pricing
No changes. $5/$25 per million input/output tokens. If you're using the 1M context window (prompts exceeding 200k tokens), premium pricing kicks in at $10/$37.50.
US-only inference is available at 1.1× token pricing for compliance-sensitive workloads.
What It Feels Like
This is the model I'm writing this post with (meta, I know). The difference from Opus 4.5 is noticeable — particularly in how it handles larger codebases and sustained reasoning over long sessions. It doesn't lose the thread as easily. It catches its own mistakes better during code review. And when it encounters ambiguity, it makes better judgment calls instead of asking you to clarify every little thing.
Michael Truell from Cursor described it well: "stronger tenacity, better code review, and it stays on long-horizon tasks where others drop off."
For those of us using Claude Code daily, agent teams are going to be a workflow changer. I'm already thinking about how to restructure my development sessions around parallel agents.
The Bigger Picture
Opus 4.6 dropped the same week OpenAI launched Codex, and it's clear the enterprise AI race is accelerating fast. Claude Code reportedly hit $1 billion in run-rate revenue just six months after going GA. Forty-four percent of enterprises now use Anthropic in production, up from near zero in early 2024.
The model is available now on claude.ai, the API (claude-opus-4-6), Amazon Bedrock, and Google Cloud Vertex AI.
Oh, and Anthropic is apparently running a Super Bowl ad on Sunday. We're definitely not in the early adopter phase anymore.