How much does a 10-variant autoresearch loop cost on the Claude API?

A 10-variant optimization run that sends about 25,000 input tokens and 8,000 output tokens costs roughly $0.07 on Haiku 4.5, $0.20 on Sonnet 4.6, or $0.33 on Opus 4.8 without caching. Enabling prompt caching on the reused strategy context cuts the input portion by about 70%, bringing a Sonnet run down to roughly $0.14.

How much does prompt caching save on the Claude API?

Cache reads bill at 10% of the standard input price, so the repeated portion of your prompt costs about 90% less after the first call. On a 10-variant Sonnet 4.6 loop that reuses a 2,000-token context block, caching cuts the input cost roughly 72% — from about $0.075 to $0.021 per run. The whole-run total drops less (around 30%) because output tokens are not cached.

Claude API Pricing in 2026: Model-by-Model Cost Breakdown

Q: How much does the Claude API cost in 2026?

Anthropic prices the Claude API per million tokens, split between input and output. As of June 2026: Claude Opus 4.8 is $5 per million input tokens and $25 per million output; Claude Sonnet 4.6 is $3 input and $15 output; Claude Haiku 4.5 is $1 input and $5 output. Prompt-cache reads bill at 10% of the input price. Verified against Anthropic's published pricing.

Q: What is the difference between Opus, Sonnet, and Haiku for marketing work?

Haiku 4.5 is fast and cheap but produces more generic copy — ideal for scoring and evaluation, less so for nuanced brand-voice variations. Sonnet 4.6 is the production standard: strong enough to write competitive copy variants under specific constraints at practical cost. Opus 4.8 is usually overkill for marketing copy; reserve it for complex strategic synthesis rather than headline variants.

How much does the Claude API cost in 2026?

As of June 2026, the Claude API costs $1–$5 per million input tokens and $5–$25 per million output tokens, depending on the model. Anthropic prices access per million tokens, separately for input (what you send) and output (what the model returns). The gap between models is large enough that picking the wrong one for a loop costs you 5× more per run. Here are the verified figures, checked against Anthropic's official pricing documentation on June 2026.

Claude API pricing — verified June 2026
Model	Input (per MTok)	Output (per MTok)	Best for
Claude Opus 4.8	$5.00	$25.00	Complex reasoning, creative judgment, long-form synthesis
Claude Sonnet 4.6	$3.00	$15.00	Most production tasks — strong capability, practical cost
Claude Haiku 4.5	$1.00	$5.00	High-volume, fast, low-cost tasks; evaluation scoring

A token is roughly three-quarters of a word. A 1,000-word document is about 1,300 tokens. Most marketing prompts — a landing page headline variation, a subject line test, an ad copy rewrite — run 500–2,000 tokens each way. That puts a single request on Sonnet at roughly $0.005 to $0.04, which sounds cheap until you multiply by hundreds of iterations in a loop.

The price per token also varies between input and output for a reason: generating tokens is computationally more expensive than reading them. In practice, output tokens dominate your bill in generation-heavy tasks (writing, rewriting) and input tokens dominate in analysis-heavy tasks (scoring, evaluating). Structuring your loop to shift work toward input over output — by caching repeated context and shortening generated responses — is the primary cost lever before you even touch model selection.

What does a 10-variant autoresearch run actually cost?

A 10-variant optimization run costs roughly $0.07 on Haiku 4.5, $0.20 on Sonnet 4.6, or $0.33 on Opus 4.8 — before prompt caching. A typical autoresearch loop for a landing page or email tests 10 variants per session: each iteration sends a strategy context (your baseline, rules, and constraints), a generation prompt, and receives the output variation plus an evaluation score. Here's the math behind those numbers.

Assume: 2,000 tokens of shared strategy context per call, 500 tokens of unique per-variant prompt, 600 tokens of output per variant, 200 tokens of evaluation output. That's 2,500 tokens in and 800 tokens out per variant, times 10 variants — roughly 25,000 input tokens and 8,000 output tokens per session.

10-variant run cost by model — no caching, June 2026 rates
Model	Input cost	Output cost	Total per run
Opus 4.8	$0.125	$0.200	~$0.33
Sonnet 4.6	$0.075	$0.120	~$0.20
Haiku 4.5	$0.025	$0.040	~$0.07

Note: these figures assume no prompt caching. Real-world costs drop significantly once you enable caching — see below. Token counts are approximate and will vary with your specific prompts. Run your own numbers using Anthropic's token counter at console.anthropic.com.

At first glance these numbers look trivial — even $0.33 per 10-variant run is less than a cup of coffee. But loops compound. Run 10 sessions per day across three pages and you're at 300 sessions a month — roughly $100/month on Opus, $60/month on Sonnet, or $21/month on Haiku before caching. For the same job, switching from Opus to Haiku cuts your bill by 80%. And adding prompt caching cuts it further still.

The practical recommendation: use Haiku 4.5 for evaluation and scoring (the high-volume, judgment-light part of the loop) and Sonnet 4.6 for generation (where reasoning quality matters for the output). This hybrid approach typically delivers a better cost-quality tradeoff than running the entire loop on a single model.

Free · email only 12-point

Is your prompt ready to run on a loop?

A 12-point check before you start spending API credits — sent straight to your inbox. No card.

Take the free assessment

What is prompt caching and how much does it save?

Prompt caching cuts your repeated-input cost by up to ~90% — cache reads bill at just 10% of the standard input price (per Anthropic's prompt-caching pricing, verified June 2026). It works by storing the portion of your input that doesn't change between calls — your strategy file, baseline copy, and evaluation rubric — so after the first request you pay full price only for the small unique prompt on top.

For a loop that reuses a fixed context block, this is the most impactful cost lever available, and it requires no change to your prompt logic — only a small addition to your API call structure. The cache is keyed to the exact token sequence, so identical context blocks across calls share the same cache entry. The minimum cacheable prefix is 1,024 tokens on Sonnet 4.6 and Opus 4.8, but 4,096 tokens on Haiku 4.5 — worth knowing if you plan to cache context on a Haiku evaluation step.

10-variant Sonnet 4.6 run — with vs without prompt caching (25k input / 8k output tokens)
Setup	Input cost	Output cost	Total per run
No caching	$0.075 (25k @ full)	$0.120	~$0.20
With caching (2,000-token context block reused per call)	$0.021 (5k full + 20k @ 10%)	$0.120	~$0.14

Caching slashes the input portion from $0.075 to about $0.021 — a ~72% cut on input cost. Because output is untouched, the whole-run total drops about 30%, from ~$0.20 to ~$0.14. Scaled to 300 sessions a month, that's a $60 Sonnet bill falling to roughly $42 — from caching alone, without touching model selection or output volume. Stack caching with the hybrid model split (Haiku for scoring, Sonnet for generation) and a moderately sized loop lands comfortably under $30/month at Sonnet-quality output.

There's one caveat worth naming honestly: prompt caching saves on input tokens, not output tokens. If your loop generates long outputs — multi-paragraph copy variants, complete page drafts — the output cost dominates and caching helps less in percentage terms. For most marketing optimization loops (headline variants, subject lines, CTA rewrites), outputs are short, so the input side is a meaningful share of the bill and caching is well worth enabling.

How the autoresearch loop is designed to minimize cost

A well-structured autoresearch loop exploits both levers automatically. The strategy file — your baseline, constraints, target audience, and evaluation rubric — lives in a fixed preamble that caches across every iteration. Each variant call sends only a small unique prompt on top of the cached context. Evaluation calls use a cheaper model (Haiku) to score the output rather than paying Sonnet or Opus rates for a task that's really just classification.

The result is that the per-variant marginal cost is very low — you're paying mostly for unique tokens, not repeated context. This is the architecture the Autoresearch Playbook's technical guide covers in detail, including the exact API call structure and caching configuration. The patterns are also implemented in the 12 ready-to-run templates in the Playbook, so you don't need to figure out the caching structure from scratch.

One more structural choice matters: output length. Constraining the model to return variants in a specific short format — a JSON object with the variant text and score, rather than a discursive explanation — cuts output tokens by 50–70% compared to letting the model explain its reasoning at length. That's not a quality tradeoff; it's just telling the model to be concise, which is part of the prompt template.

Claude subscription vs API — which should you use?

Use a subscription for interactive work and the API for automated loops — the deciding factor is whether a human is watching each step. Interactive use — you reading each response, deciding what to run next — is best served by a subscription. Claude Pro ($20/month), Max 5x ($100/month), and Max 20x ($200/month) all include generous interactive usage with Claude in the browser and terminal. That's your tool for strategic work: brainstorming, reviewing variants, editing copy.

Programmatic use — automated loops, scripted agents, Claude Code running unattended — bills per token through the API, via a standalone API key (or whatever programmatic allowance your subscription includes). The key point for a loop: it is billed by the token, every iteration — which is exactly what the local-model route below avoids.

Subscription vs direct API — decision guide
Usage type	Best option	Why
Interactive chat, terminal, Claude Code with review	Claude Pro or Max subscription	Flat rate, no per-token billing
Automated loops, modest volume	Claude Pro or Max + an API key for the loop	Interactive quota for manual work; per-token API for automation
High-volume programmatic work beyond credit limits	Direct API key (console.anthropic.com)	No subscription overhead; pure pay-per-token
Evaluation-only or scoring tasks at high volume	API key, Haiku model	5× cheaper than Sonnet for judgment-light work

For most solo operators running an autoresearch loop a few times a week, a Claude Pro or Max subscription handles the interactive work, and a standalone API key (with a hard cost-stop in the loop) covers the automation for cents per run. The bigger savings lever is routing the high-volume scoring half of the loop to a local model — $0 per token — and reserving the API for the creative generation.

Frequently asked questions

How much does 1 million Claude tokens cost?

It depends on the model and direction. Input tokens: Opus 4.8 = $5, Sonnet 4.6 = $3, Haiku 4.5 = $1 per million. Output tokens: Opus 4.8 = $25, Sonnet 4.6 = $15, Haiku 4.5 = $5 per million. Most typical prompts use far less than a million tokens — a 1,000-word document is about 1,300 tokens, so $1 on Haiku gets you roughly 750,000 input tokens worth of content.

Is the Claude API cheaper than a Claude subscription?

For interactive use, a Claude Pro subscription at $20/month is almost always cheaper than paying per token, because the interactive quota is effectively unlimited within reasonable use. For programmatic, scripted, or looped usage, the API can be cheaper once you optimize model selection and caching — especially if you use Haiku for evaluation tasks. For most solo operators, the sweet spot is a subscription for interactive work plus occasional API calls for automation.

What's the difference between Opus, Sonnet, and Haiku for marketing work?

For marketing optimization loops, the practical difference is: Haiku is fast and cheap but produces more generic copy — fine for scoring and evaluation, not ideal for generating variations that require nuanced brand voice. Sonnet is the production standard — strong enough to write competitive copy variants with specific constraints, cost-effective for generation tasks. Opus is overkill for most marketing copy; use it for complex strategic synthesis, not headline variants.

How do I enable prompt caching with the Claude API?

Add "cache_control": {"type": "ephemeral"} to the content block you want cached in your messages array. Anthropic caches the prefix up to that marker — typically your system prompt or the shared strategy context at the top of your conversation. The cache is ephemeral (lasts about 5 minutes by default) and keyed to the exact token sequence. The Autoresearch Playbook templates include this structure pre-configured; see the technical guide for the full API call pattern.

How much does the Claude API actually cost? A model-by-model breakdown.