Anthropic’s New Claude Sonnet 4.6 Is Beating Its Own Flagship Model

Anthropic is rolling out Claude Sonnet 4.6, calling it the most capable Sonnet model so far and describing it as a broad upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. A headline feature is a 1M token context window (beta), aimed at letting the model work effectively over extremely large inputs like entire codebases, long contracts, or many research papers at once. For Free and Pro users, Sonnet 4.6 becomes the default model in claude.ai and Claude Cowork, while pricing stays the same as Sonnet 4.5 (starting at $3/$15 per million tokens). The company also emphasizes safety evaluation results, saying researchers found it as safe as or safer than recent Claude models, with strong safety behaviors and no major high-stakes misalignment concerns.

Key points

“Full upgrade” across multiple skills: Anthropic frames Sonnet 4.6 as a comprehensive improvement (coding, computer use, long-context reasoning, agent planning, knowledge work, design), not a narrow tweak.
1M-token context window (beta): Positioned as large enough to hold entire codebases and other very long materials in a single request, with the added claim that it can reason effectively across all that context.
Default for Free and Pro: Sonnet 4.6 is now the default model for those tiers in claude.ai and Claude Cowork, with pricing unchanged vs. Sonnet 4.5.
Developer preference metrics (Claude Code testing): Early testing says users preferred Sonnet 4.6 over Sonnet 4.5 about 70% of the time, and over Opus 4.5 (their November 2025 frontier model) 59% of the time.
Computer use progress + OSWorld framing: Anthropic highlights steady gains on OSWorld over 16 months and claims early users are seeing human-level capability in some tasks (like complex spreadsheets and multi-step web forms), while still lagging “the most skilled humans.”
Prompt injection risk acknowledged: The article flags that computer use introduces risks like prompt injection attacks, and claims safety evaluations show Sonnet 4.6 is a major improvement over Sonnet 4.5 and performs similarly to Opus 4.6 on resistance.
Long-horizon planning example (Vending-Bench Arena): Sonnet 4.6 is described as adopting a strategy of investing heavily in capacity early, then pivoting to profitability late, helping it finish “well ahead” of competitors in that evaluation.
Product/platform updates: Mentions support for adaptive thinking, extended thinking, and context compaction (beta) on the Claude Developer Platform. The API’s web search/fetch tools now “automatically write and execute code” to filter/process search results, and several tooling capabilities are said to be generally available (code execution, memory, programmatic tool calling, etc.).
Claude in Excel connectors: The Excel add-in now supports MCP connectors (e.g., S&P Global, LSEG, Daloopa, PitchBook, Moody’s, FactSet) on paid tiers (Pro/Max/Team/Enterprise), enabling pulling external context into Excel workflows.

Key quotes

“Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design.”
“Sonnet 4.6 also features a 1M token context window in beta.”
“Performance that would have previously required reaching for an Opus-class model (including on real-world, economically valuable office tasks) is now available with Sonnet 4.6.”
Safety eval conclusion: “a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment.”
On computer-use automation constraints: “Almost every organization has software it can’t easily automate… But a model that can use a computer the way a person does changes that equation.”
On progress pace: “But the rate of progress is remarkable nonetheless.”
On early user preference vs. Opus 4.5: users rated it “significantly less prone to overengineering and ‘laziness,’ and meaningfully better at instruction following.”
On long-context value: “Sonnet 4.6’s 1M token context window is enough to hold entire codebases, lengthy contracts, or dozens of research papers in a single request. More importantly, Sonnet 4.6 reasons effectively across all that context.”
On prompt injection: “malicious actors can attempt to hijack the model by hiding instructions on websites in what’s known as a prompt injection attack.”

Implications

Sonnet is being positioned as “good enough to replace Opus-class usage” for many practical tasks, which could shift default developer and enterprise workflows toward a cheaper model tier, especially if the preference metrics (70% vs 4.5; 59% vs Opus 4.5) hold up beyond early testing.
A 1M-token context window (even in beta) signals a push toward whole-project reasoning, encouraging use cases like “entire codebase” work sessions, long contract review, or multi-paper research synthesis without chunking, assuming the model’s claimed ability to reason across all that context translates to real-world reliability.
Computer-use capability is framed as a bridge over the “no API” gap for legacy or specialized internal tools, implying a route to automation without bespoke connectors, while also raising the stakes on prompt injection defenses and safety practices.
Context compaction and automated code-based filtering of search results point to a product direction focused on keeping conversations long and useful without ballooning costs, suggesting more “agentic” workflows where the system manages context intelligently.
Excel MCP connectors expand Claude’s footprint into finance and data workflows, making it easier for spreadsheet-heavy teams to bring external sources into analysis without leaving Excel, potentially increasing adoption in research and financial analysis environments tied to those data providers.

Source: https://www.anthropic.com/news/claude-sonnet-4-6