What a Million Tokens of Context Actually Changes
Anthropic just made 1M context generally available at standard pricing. The interesting part isn't the number — it's what becomes possible when you stop engineering around context limits.
Anthropic announced this week that the full 1M context window for Claude Opus 4.6 and Sonnet 4.6 is now generally available at standard pricing. No long-context premium. A 900K-token request costs the same per-token rate as a 9K one — $5/$25 per million tokens for Opus, $3/$15 for Sonnet.
This isn't just a pricing change. It's the removal of an engineering constraint that has shaped how every serious AI application has been built for the past three years.
The Context Tax
Every AI-powered system we've built has included context management as a core engineering concern. Summarise earlier conversation to fit the window. Chunk documents and retrieve relevant sections. Compress tool call results. Clear history when it gets too long. These aren't features — they're workarounds for a fundamental limitation, and every one of them loses information.
The most insidious form of context management is compaction — the automatic summarisation that tools like Claude Code perform when a session gets too long. Compaction is lossy by definition. The system decides what's important and discards the rest. In a debugging session, the detail it discards might be the exact error message you need. In a long planning session, the early architectural decision that constrains everything downstream gets compressed into a sentence that loses the reasoning behind it.
Anthropic's blog post includes a telling quote: "Claude Code can burn 100K+ tokens searching Datadog, Braintrust, databases, and source code. Then compaction kicks in. Details vanish. You're debugging in circles." That's not a Claude Code problem — it's a context window problem, and every AI coding tool has some version of it.
What 1M Actually Enables
With a million tokens of context, several categories of work that previously required elaborate engineering become straightforward:
Entire codebases in context. Not retrieved chunks. Not summarised overviews. The actual code. A million tokens is roughly 750,000 words — enough to hold most production codebases in their entirety. The model can see the function definition, every place it's called, the tests that cover it, and the documentation that describes it, all at once.
Long-running agent sessions. An AI agent that runs for hours — researching, building, testing, iterating — accumulates a massive context of tool calls, observations, intermediate reasoning, and decisions. Previously, this context had to be periodically compressed, losing the thread of reasoning. With 1M tokens, the full trace stays intact. The agent at hour three still has access to the decision made in minute five.
Document-heavy analysis. The new limit supports up to 600 images or PDF pages per request, up from 100. Contract review, regulatory analysis, due diligence — work that involves hundreds of pages of source material can now be processed in a single pass rather than chunked and cross-referenced.
The Compaction Problem Isn't Solved, It's Bypassed
The elegant thing about a larger context window is that it doesn't fix the summarisation problem — it makes it irrelevant for most practical use cases. Lossy summarisation of earlier context was always a bad solution to a real constraint. The better solution was always "don't summarise, just remember everything." That's now feasible for sessions that would previously have hit the wall.
Anthropic reports Opus 4.6 scores 78.3% on MRCR v2 — a benchmark for multi-document retrieval and comprehension at long context lengths — the highest among frontier models. This matters because a large context window is useless if the model can't actually attend to and reason over all of it. The history of context window expansion has included models that technically supported large windows but degraded significantly in the middle and tail. A 78.3% MRCR score at 1M tokens suggests the quality is holding.
What This Means for How We Build
We use Claude as a core part of our development process. This conversation — the one that produced this blog post — has been running for hours, building an entire website from scratch: content strategy, architecture, implementation, deployment, case studies, blog pipeline, admin system. The context window has held the full history of decisions, code, and reasoning without compaction. That's not a demo scenario. That's a real working session.
The practical implications for software engineering are significant:
- Spec-driven development scales better. When the full specification — behaviours, API contracts, edge cases, acceptance criteria — can live in context alongside the implementation, there's no gap between what was specified and what the model is building. No "I lost track of requirement 47."
- Debugging gets dramatically better. The debugging loop of "read error → check code → check logs → form hypothesis → test" can now happen within a single context that holds all the relevant information simultaneously.
- Review and audit become tractable. Reviewing an entire pull request — every file, every test, every dependency change — in a single context pass, rather than file-by-file with lossy summaries of what came before.
The Cost Question
At standard pricing with no long-context premium, a full 1M-token Opus request costs roughly $30 (input + output). That's not cheap for a single API call, but it's transformative economics for work that previously required multiple calls with context management overhead, RAG infrastructure, or human-in-the-loop summarisation.
For Claude Code users on Max, Team, or Enterprise plans, 1M context is included with Opus 4.6 by default. No configuration. No opt-in. Sessions just work longer before hitting limits.
The real cost saving isn't in the per-token rate — it's in the engineering time you don't spend building and maintaining context management systems. Every RAG pipeline, every summarisation layer, every context window manager is engineering debt that exists because of a constraint that just got 5x wider.
What We're Watching
The million-token window is available now on the Claude Platform, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The question is how quickly the ecosystem adapts — tool builders, agent frameworks, and development workflows that were all designed around 200K limits.
The frameworks that win will be the ones that treat large context as a first-class capability rather than an edge case. That means rethinking RAG (when do you still need retrieval vs. just loading everything?), rethinking agent memory (when is persistent memory necessary vs. just keeping the session going?), and rethinking document processing (when do you chunk vs. process whole?).
The constraint that shaped three years of AI engineering just changed. The interesting work now is figuring out what to build differently.