Pricing Cost math Caching FAQ The Playbook Get the Playbook
Pricing & cost

How much does the Claude API actually cost? A model-by-model breakdown.

Opus 4.8, Sonnet 4.6, Haiku 4.5 — the price differences are significant, and the right model choice for a 10-variant optimization loop can mean the difference between $0.20 a run and $2.00 a run. Here's the verified pricing, the real cost math, and how prompt caching changes everything.

By the maker of the Autoresearch Playbook Last verified: June 15, 2026
The short version
  • Verified June 2026 pricing: Opus 4.8 = $5 in / $25 out, Sonnet 4.6 = $3 / $15, Haiku 4.5 = $1 / $5 per million tokens.
  • A 10-variant autoresearch run costs roughly $0.20 on Haiku, $0.70 on Sonnet, or $2.00 on Opus without caching.
  • Prompt caching bills repeated context at 10% of the input price — cutting input cost ~70%+ for loops that reuse the same strategy file every iteration (the whole-run total drops less, since output isn't cached).
  • For most non-training marketing loops, Sonnet 4.6 + prompt caching is the best cost-quality tradeoff.

How much does the Claude API cost in 2026?

As of June 2026, the Claude API costs $1–$5 per million input tokens and $5–$25 per million output tokens, depending on the model. Anthropic prices access per million tokens, separately for input (what you send) and output (what the model returns). The gap between models is large enough that picking the wrong one for a loop costs you 5× more per run. Here are the verified figures, checked against Anthropic's official pricing documentation on June 15, 2026.

Claude API pricing — verified June 15, 2026
ModelInput (per MTok)Output (per MTok)Best for
Claude Opus 4.8$5.00$25.00Complex reasoning, creative judgment, long-form synthesis
Claude Sonnet 4.6$3.00$15.00Most production tasks — strong capability, practical cost
Claude Haiku 4.5$1.00$5.00High-volume, fast, low-cost tasks; evaluation scoring

A token is roughly three-quarters of a word. A 1,000-word document is about 1,300 tokens. Most marketing prompts — a landing page headline variation, a subject line test, an ad copy rewrite — run 500–2,000 tokens each way. That puts a single request on Sonnet at roughly $0.005 to $0.04, which sounds cheap until you multiply by hundreds of iterations in a loop.

The price per token also varies between input and output for a reason: generating tokens is computationally more expensive than reading them. In practice, output tokens dominate your bill in generation-heavy tasks (writing, rewriting) and input tokens dominate in analysis-heavy tasks (scoring, evaluating). Structuring your loop to shift work toward input over output — by caching repeated context and shortening generated responses — is the primary cost lever before you even touch model selection.

What does a 10-variant autoresearch run actually cost?

A 10-variant optimization run costs roughly $0.07 on Haiku 4.5, $0.20 on Sonnet 4.6, or $0.33 on Opus 4.8 — before prompt caching. A typical autoresearch loop for a landing page or email tests 10 variants per session: each iteration sends a strategy context (your baseline, rules, and constraints), a generation prompt, and receives the output variation plus an evaluation score. Here's the math behind those numbers.

Assume: 2,000 tokens of shared strategy context per call, 500 tokens of unique per-variant prompt, 600 tokens of output per variant, 200 tokens of evaluation output. That's 2,500 tokens in and 800 tokens out per variant, times 10 variants — roughly 25,000 input tokens and 8,000 output tokens per session.

10-variant run cost by model — no caching, June 2026 rates
ModelInput costOutput costTotal per run
Opus 4.8$0.125$0.200~$0.33
Sonnet 4.6$0.075$0.120~$0.20
Haiku 4.5$0.025$0.040~$0.07

Note: these figures assume no prompt caching. Real-world costs drop significantly once you enable caching — see below. Token counts are approximate and will vary with your specific prompts. Run your own numbers using Anthropic's token counter at console.anthropic.com.

At first glance these numbers look trivial — even $0.33 per 10-variant run is less than a cup of coffee. But loops compound. Run 10 sessions per day across three pages and you're at 300 sessions a month — roughly $100/month on Opus, $60/month on Sonnet, or $21/month on Haiku before caching. For the same job, switching from Opus to Haiku cuts your bill by 80%. And adding prompt caching cuts it further still.

The practical recommendation: use Haiku 4.5 for evaluation and scoring (the high-volume, judgment-light part of the loop) and Sonnet 4.6 for generation (where reasoning quality matters for the output). This hybrid approach typically delivers a better cost-quality tradeoff than running the entire loop on a single model.

Free · email only 12-point
Is your prompt ready to run on a loop?

A 12-point check before you start spending API credits — sent straight to your inbox. No card.

Take the free assessment

What is prompt caching and how much does it save?

Prompt caching cuts your repeated-input cost by up to ~90% — cache reads bill at just 10% of the standard input price (per Anthropic's prompt-caching pricing, verified June 15, 2026). It works by storing the portion of your input that doesn't change between calls — your strategy file, baseline copy, and evaluation rubric — so after the first request you pay full price only for the small unique prompt on top.

For a loop that reuses a fixed context block, this is the most impactful cost lever available, and it requires no change to your prompt logic — only a small addition to your API call structure. The cache is keyed to the exact token sequence, so identical context blocks across calls share the same cache entry. The minimum cacheable prefix is 1,024 tokens on Sonnet 4.6 and Opus 4.8, but 4,096 tokens on Haiku 4.5 — worth knowing if you plan to cache context on a Haiku evaluation step.

10-variant Sonnet 4.6 run — with vs without prompt caching (25k input / 8k output tokens)
SetupInput costOutput costTotal per run
No caching$0.075 (25k @ full)$0.120~$0.20
With caching (2,000-token context block reused per call)$0.021 (5k full + 20k @ 10%)$0.120~$0.14

Caching slashes the input portion from $0.075 to about $0.021 — a ~72% cut on input cost. Because output is untouched, the whole-run total drops about 30%, from ~$0.20 to ~$0.14. Scaled to 300 sessions a month, that's a $60 Sonnet bill falling to roughly $42 — from caching alone, without touching model selection or output volume. Stack caching with the hybrid model split (Haiku for scoring, Sonnet for generation) and a moderately sized loop lands comfortably under $30/month at Sonnet-quality output.

There's one caveat worth naming honestly: prompt caching saves on input tokens, not output tokens. If your loop generates long outputs — multi-paragraph copy variants, complete page drafts — the output cost dominates and caching helps less in percentage terms. For most marketing optimization loops (headline variants, subject lines, CTA rewrites), outputs are short, so the input side is a meaningful share of the bill and caching is well worth enabling.

How the autoresearch loop is designed to minimize cost

A well-structured autoresearch loop exploits both levers automatically. The strategy file — your baseline, constraints, target audience, and evaluation rubric — lives in a fixed preamble that caches across every iteration. Each variant call sends only a small unique prompt on top of the cached context. Evaluation calls use a cheaper model (Haiku) to score the output rather than paying Sonnet or Opus rates for a task that's really just classification.

The result is that the per-variant marginal cost is very low — you're paying mostly for unique tokens, not repeated context. This is the architecture the Autoresearch Playbook's technical guide covers in detail, including the exact API call structure and caching configuration. The patterns are also implemented in the 12 ready-to-run templates in the Playbook, so you don't need to figure out the caching structure from scratch.

One more structural choice matters: output length. Constraining the model to return variants in a specific short format — a JSON object with the variant text and score, rather than a discursive explanation — cuts output tokens by 50–70% compared to letting the model explain its reasoning at length. That's not a quality tradeoff; it's just telling the model to be concise, which is part of the prompt template.

Claude subscription vs API — which should you use?

Use a subscription for interactive work and the API for automated loops — the deciding factor is whether a human is watching each step. Interactive use — you reading each response, deciding what to run next — is best served by a subscription. Claude Pro ($20/month), Max 5x ($100/month), and Max 20x ($200/month) all include generous interactive usage with Claude in the browser and terminal. That's your tool for strategic work: brainstorming, reviewing variants, editing copy.

Programmatic use — automated loops, scripted agents, Claude Code running unattended — draws from the API, either via a standalone API key or via the agent credit that Anthropic added to subscriptions after June 15, 2026. The agent credit ($20 on Pro, up to $200 on Max 20x) covers a lot of loop usage before overages kick in at standard API rates.

Subscription vs direct API — decision guide
Usage typeBest optionWhy
Interactive chat, terminal, Claude Code with reviewClaude Pro or Max subscriptionFlat rate, no per-token billing
Automated loops within subscription agent creditMax 5x or Max 20xLarger agent credit covers more unattended runs
High-volume programmatic work beyond credit limitsDirect API key (console.anthropic.com)No subscription overhead; pure pay-per-token
Evaluation-only or scoring tasks at high volumeAPI key, Haiku model5× cheaper than Sonnet for judgment-light work

For most solo operators running an autoresearch loop a few times a week, a Max 5x subscription with its $100 agent credit covers all the loop usage with no overages, while the interactive quota handles all the manual work. Moving to a standalone API key only makes sense once you're running loops at high enough volume that the subscription credit is regularly exhausted — at which point you'll also benefit from fine-grained model selection and caching control that the API enables directly.

Frequently asked questions

How much does 1 million Claude tokens cost?

It depends on the model and direction. Input tokens: Opus 4.8 = $5, Sonnet 4.6 = $3, Haiku 4.5 = $1 per million. Output tokens: Opus 4.8 = $25, Sonnet 4.6 = $15, Haiku 4.5 = $5 per million. Most typical prompts use far less than a million tokens — a 1,000-word document is about 1,300 tokens, so $1 on Haiku gets you roughly 750,000 input tokens worth of content.

Is the Claude API cheaper than a Claude subscription?

For interactive use, a Claude Pro subscription at $20/month is almost always cheaper than paying per token, because the interactive quota is effectively unlimited within reasonable use. For programmatic, scripted, or looped usage, the API can be cheaper once you optimize model selection and caching — especially if you use Haiku for evaluation tasks. For most solo operators, the sweet spot is a subscription for interactive work plus occasional API calls for automation.

What's the difference between Opus, Sonnet, and Haiku for marketing work?

For marketing optimization loops, the practical difference is: Haiku is fast and cheap but produces more generic copy — fine for scoring and evaluation, not ideal for generating variations that require nuanced brand voice. Sonnet is the production standard — strong enough to write competitive copy variants with specific constraints, cost-effective for generation tasks. Opus is overkill for most marketing copy; use it for complex strategic synthesis, not headline variants.

How do I enable prompt caching with the Claude API?

Add "cache_control": {"type": "ephemeral"} to the content block you want cached in your messages array. Anthropic caches the prefix up to that marker — typically your system prompt or the shared strategy context at the top of your conversation. The cache is ephemeral (lasts about 5 minutes by default) and keyed to the exact token sequence. The Autoresearch Playbook templates include this structure pre-configured; see the technical guide for the full API call pattern.

Loop architecture included

12 templates with caching, model routing, and cost guards built in.

The Autoresearch Playbook includes the exact API call structure, caching configuration, and model routing logic for running loops at near-zero cost. $97, one time, 14-day money-back guarantee.

Not sure if your prompt is ready? Take the free 12-point assessment →