The hidden cost creep in Claude Opus 4.7

By Jameson Daines · 2026-05-23 · 6 min read

Anthropic released Claude Opus 4.7 on April 16, 2026. They didn't touch the per-token prices. And yet, if you swapped Opus 4.6 for Opus 4.7 in your workflow without changing anything else, your API bill probably went up by around 35%.

That's not a mistake. It's a consequence of a tokenizer change that most BYOK users won't notice until they look at a billing statement and do the math. I want to walk through exactly what happened, what it costs in real numbers, and what to do about it, particularly if you're an attorney, CPA, or consultant using AI daily for client work.

What a tokenizer actually does

Before getting into the numbers, a quick grounding on what we're talking about. A tokenizer is the piece of software that converts your text into the numerical units the model processes. "Hello, world!" is two tokens in some tokenizers, three in others. The split is arbitrary from your perspective as a user, but it's completely deterministic from the model's perspective.

Anthropic updated their tokenizer with Opus 4.7. The new tokenizer is more efficient in certain respects, particularly for code and structured data. But for the kind of prose-heavy, context-rich input that attorneys and CPAs send (a privileged memo draft, a client intake summary, a tax position analysis with supporting facts) it generates roughly 35% more tokens for the same text.

Per-token price: unchanged. Tokens per request: up 35%. Effective cost per request: up 35%.

That's the whole story. But it's worth running through a concrete example because the magnitude surprised me when I first did the arithmetic.

A real example: client matter analysis at 5,000 input tokens

Say you're running a client matter analysis. You've got a prompt that includes the client's background, the relevant facts, a summary of applicable authority, and a structured output request for a memo. Call it 5,000 tokens of input under the old Opus 4.6 tokenizer. The response is around 1,200 tokens. You run this a few times a week.

Under Opus 4.6, with Anthropic's pricing of $15 per million input tokens and $75 per million output tokens (Opus-tier pricing in effect at that time), that run cost roughly $0.075 for input and $0.090 for output, about $0.165 per run.

Under Opus 4.7, current pricing is $5 per million input tokens and $25 per million output tokens. But the new tokenizer now sees your 5,000-token input as approximately 6,750 tokens. At those rates, that's about $0.034 for input and $0.030 for output, roughly $0.064 per run.

Wait, that looks cheaper. And in this particular case it is, because Anthropic also restructured Opus pricing alongside the tokenizer change. The point isn't that Opus 4.7 is always more expensive in absolute terms. The point is that the tokenizer change means you're using more tokens than you think, and any cost model you built before April 16 is now wrong. Not because the per-token prices changed, but because the number of tokens your input generates changed.

If you estimated your BYOK usage costs in early 2026, before April 16, that estimate is off. The per-token prices are the same. The token count for your typical prompt is not.

Current Anthropic pricing as of this writing: Haiku 4.5 at $1/$5 per million input/output tokens, Sonnet 4.6 at $3/$15, Opus 4.7 at $5/$25. You can cross-reference these at aipricing.guru which tracks changes as they happen.

Haiku 3 is gone and it mattered

Three days after Opus 4.7 launched, on April 19, Anthropic retired Claude Haiku 3. That model was priced at $0.25 per million input tokens and $1.25 per million output tokens. It was the cheapest available Claude model by a wide margin.

Haiku 3 was the model you used for high-volume, low-stakes tasks: summarizing long documents, classifying correspondence, reformatting content for templates. For a law firm or CPA practice with a lot of document intake, Haiku 3 was the sensible cost-control choice for the work that didn't need Sonnet-level reasoning. Haiku 4.5, its replacement, is priced at $1/$5 per million tokens. That's 4x the input cost and 4x the output cost for those high-volume steps.

For a solo attorney running a document classification step 500 times a month on 800-token documents, Haiku 3 cost about $0.10 per month. Haiku 4.5 costs about $0.40 per month. Not a crisis in isolation. But multiply that across several workflow steps, and it registers. The floor price for Claude API usage has moved up.

Three levers that actually help

None of this means you should stop using the API. It means you should be deliberate about which model you're reaching for and when. Here's how I think about it for professional work.

1. Use Sonnet 4.6 for most client work

Sonnet 4.6 at $3/$15 per million tokens handles the work that Opus used to handle at a fraction of the cost. The quality gap between Sonnet and Opus has narrowed with every model generation. For memo drafting, issue analysis, document summarization, client correspondence: Sonnet 4.6 gets you 90% of Opus quality at 60% of the input cost. Reserve Opus 4.7 for tasks where the quality difference demonstrably shows up: complex multi-document reasoning, nuanced strategic synthesis, anything where you're making a high-stakes decision based on the output.

2. Prompt caching pays off on repeated client context

Anthropic's prompt caching cuts cached input costs by 90%. If you have a large block of context that stays the same across many requests (your firm's standard matter intake template, a corpus of prior client communications, a regulatory framework that applies to multiple matters), you can cache that block and pay only 10% of the normal input cost on subsequent hits.

The break-even on prompt caching is roughly 2 requests. If you're running the same context-heavy prompt more than twice, caching is saving you money. With the tokenizer now generating 35% more tokens from the same input, the absolute dollar value of each cache hit has also gone up, which makes caching even more worth setting up than it was before.

Batch processing is the other lever. Anthropic offers 50% discounts on batch API requests for tasks that don't need real-time completion. If you're processing a backlog of intake documents or running overnight analysis, batch is straightforward money left on the table if you're not using it.

3. Audit what you're actually sending

This sounds obvious but most people haven't done it. The tokenizer change is a good forcing function to look at your longest prompts and ask: what's in here that doesn't need to be?

Every block of text you include in a prompt costs money at the current tokenizer rate. Conversation history, long system prompts, verbose context documents: all of these get tokenized at 35% higher density than a year ago. Trim what's not doing work. Restructure context to front-load the most relevant material. Keep system prompts as tight as you can make them.

With Advisor Prep Hero's BYOK setup, you see the token count on every request. That visibility is what makes it possible to catch this. When the token counter is buried in a SaaS provider's interface (or missing entirely), you don't know the cost until you get the bill.

What this means for BYOK professionals specifically

The whole premise of BYOK is that you're in control of your AI costs and your data path. You're not paying a SaaS markup, and your client information isn't touching an intermediary's servers. You pay for exactly what you use, at the provider's rates, with a direct connection between your machine and the model.

That's a genuinely better deal, for cost reasons and for confidentiality reasons. An attorney using BYOK can describe the data path to bar counsel: client data goes from my machine to Anthropic's API, full stop. No intermediary. That's the architecture ABA Formal Opinion 512 asks you to understand and be able to account for.

But BYOK comes with a responsibility that managed-AI-tool users don't have: you need to understand what you're paying for and why. The tokenizer change is exactly the kind of thing that can silently inflate your bill if you're not paying attention.

The short version of what's changed since April 2026:

Opus 4.7's tokenizer generates ~35% more tokens from the same input text compared to Opus 4.6
Per-token prices are unchanged, but your effective cost per request has risen accordingly for prose-heavy professional work
Haiku 3 is retired; the cheapest Claude model is now Haiku 4.5 at $1/$5 per million tokens (4x the floor price)
Batch processing and prompt caching are your two best cost levers and both are worth using
Sonnet 4.6 is the right default for most client work; Opus 4.7 is for the cases where the quality difference actually matters

None of this is a reason to stop using Claude. It's a reason to be a deliberate buyer rather than a passive one. And that's exactly what BYOK makes possible.

Try Advisor Prep Hero with your own API key

Jameson Daines builds Advisor Prep Hero for attorneys, CPAs, and independent consultants. Read about the BYOK math for professional users or get Advisor Prep Hero at advisorprephero.com.