What Is Autoresearch? The AI Optimization Loop for Marketers (2026)

What is autoresearch?

Definition

Autoresearch is an AI optimization loop in which an agent repeatedly proposes a change, measures it against a scorecard, keeps the change only if it beats the current baseline, and reverts it if it doesn't — then repeats automatically. It was open-sourced by AI researcher Andrej Karpathy in March 2026 and went viral as a legible, hands-off way to let an AI improve its own work. Marketers apply the same loop to copy and layouts instead of model code.

Mechanically, the loop has four steps: an AI agent proposes a change to something — a headline, a call-to-action button, an email subject line, a landing page section. A second pass evaluates that change against a defined scorecard. If the change scores better than the current baseline, it's kept. If it doesn't, it's reverted. Then the loop runs again with a new proposal.

That's the whole method. The "research" part isn't literature review or user surveys — it's the AI generating, testing, and scoring its own proposals in rapid sequence, with a hard keep-or-revert gate after each one. The loop runs until you hit a budget limit, a time limit, or a score threshold.

What makes it different from regular A/B testing is that there's no traffic split, no waiting for statistical significance, and no external testing platform required. The evaluation happens computationally — either by a second AI scoring the output against your criteria, or by a measurable proxy like page load time, word count within a range, or a rule-based readability check. For some use cases (conversion rate, click rate), you still need real traffic to validate — but the autoresearch loop lets you narrow the variant space dramatically before you run a single live experiment.

Why did it go viral in March 2026?

On March 6, 2026, Andrej Karpathy published a GitHub repository demonstrating the loop — an AI agent that autonomously edits and re-runs a small training experiment, keeping changes only when they improve the score. Within days it was on Hacker News' front page and trending across developer social media, and it has since crossed 88,000+ GitHub stars — one of the fastest-trending repositories of early 2026, spawning a community of 2,700+ implementations.

The speed came from the idea's clarity. The loop is legible in a way that most AI research is not. You don't need to understand transformer architectures, loss functions, or gradient descent to grasp "propose, measure, keep or revert, repeat." The GIF demonstrations were immediately obvious. People shared it because they could picture their own application on first read.

The original demo was about model training: per the repo, an AI agent edits the training code, runs short training jobs on a single GPU, and uses a model-quality metric to decide whether to keep each change — a hands-off take on self-improvement. That framing required GPUs, Python, and machine learning context. But within days, developers and marketers began asking the obvious question: what if you applied the same loop to things you already control, without training weights? Webpages. Emails. Ad copy. Sales sequences.

Within weeks the conversation had split into two tracks: the ML-research track (self-improving models) and the applied-optimization track (self-improving marketing assets). The Autoresearch Playbook is squarely in the second track.

How does the autoresearch loop work?

The autoresearch loop works by running a four-step cycle — propose, measure, keep or revert, repeat — over and over until it hits a limit. The core mechanic has a name you've probably seen in version control: keep or revert. Here's how it runs for a marketing use case.

One loop iteration — example: landing page headline
Step	What happens	Who does it
1. Propose	The AI reads your current headline and your program.md instruction file, then generates a variant	AI agent (e.g. Claude Code)
2. Measure	A scorecard evaluates the variant: clarity score, keyword presence, character count, or a second AI judge rating it against criteria	Evaluator (AI or rule-based)
3. Keep or revert	If the variant beats the baseline score, it becomes the new baseline. If not, the original is restored.	Automated decision rule
4. Repeat	The loop runs again with the updated (or restored) baseline, up to your defined limit	Loop controller

The instruction file — sometimes called program.md — is the most important piece. It tells the agent what to optimize, what the scoring criteria are, what the limits are (max cost, max iterations), and what "better" looks like. Without a well-written instruction file, the loop produces random variation. With one, it produces consistent, directional improvement.

The Autoresearch Playbook's 12 templates are essentially pre-written instruction files for the most common marketing optimization tasks: landing page headlines, CTA copy, email subject lines, cold email sequences, and more. Each template encodes the right scoring criteria for that use case, so you're not building the scorecard from scratch.

For the full technical breakdown of how the loop is structured and what makes a good program.md, see the complete guide to how the autoresearch loop works.

Free · email only 12-point

Is your AI prompt ready to run on a loop?

A 12-point check, one honest score — sent straight to your inbox. No card.

Take the free assessment

Can marketers use autoresearch without coding?

Yes — marketers can run autoresearch without writing any code. The marketing application looks nothing like the ML research version: you're not training model weights, running CUDA kernels, or touching Python. You're optimizing assets you already own — and the AI agent does the editing work, not you.

The three areas where marketers see the most immediate value are landing pages, email sequences, and ad copy. Here's what the loop looks like in each.

Landing page optimization

You point the loop at your landing page's headline section, hero copy, or call-to-action. The agent generates variants — different word choices, different lengths, different emphasis — and scores each one against criteria you specify: does it contain the primary keyword? Is it under 12 words? Does it pass a clarity test? The loop keeps the version with the highest score and generates the next variant from that baseline. After 10-20 iterations, you have a headline that's been systematically tightened against your criteria, not just one round of "let me try a few ideas."

The practical advantage over traditional A/B testing: you can run 20 iterations before lunch and have a shortlist of 3-4 genuinely different candidates to put into a live A/B test. Traditional A/B platforms require you to generate variants manually, set up traffic splits, wait for significance, and repeat. The autoresearch loop compresses the ideation-and-filtering phase from days to hours.

Email sequence optimization

Subject lines and email bodies are ideal autoresearch targets because the scoring criteria are measurable and explicit: open rate proxies (power words, urgency signals, personalization tokens), body length, link placement, tone consistency. A loop on a cold email sequence can run 30 variants of a subject line in under an hour, scoring each one against a multi-criteria rubric, before you pick the top candidates to send in a real split test.

Ad copy

For paid search and social ads, the autoresearch loop is particularly powerful because the character limits and structural requirements (headline 1, headline 2, description) create a well-defined scoring surface. The agent proposes, scores against character limits and keyword density requirements, keeps the variants that pass, and generates the next round. What typically takes a copywriter an afternoon — brainstorming and filtering 50 ad variations — can be done in a 15-minute loop run.

What you actually need to start

You need three things: a defined asset to optimize (a page, an email, an ad), a scoring rubric for that asset (what does "better" look like, as specific criteria), and a way to run the loop (Claude Code or Cursor pointed at your instruction file). No developer, no traffic split, no A/B platform subscription required. The Autoresearch Playbook provides the instruction files — pre-built for 12 marketing use cases — so you don't have to write the scoring rubric from scratch.

Frequently asked questions

Is autoresearch the same as the Karpathy loop?

"Karpathy loop" became a popular shorthand after the March 2026 virality. The underlying method — propose, measure, keep or revert, repeat — is what people mean when they use either term. This content is independently created and is not endorsed by or affiliated with Andrej Karpathy.

Do I need to know how to code to run autoresearch?

No. You write a plain-English instruction file (the program.md), install an AI coding agent like Claude Code, and point it at your file. The agent handles the iteration. You review the output. No Python, no APIs, no dev environment required beyond the agent install.

How is autoresearch different from asking ChatGPT for 10 variations?

When you ask for 10 variations, you get a flat list — nothing is evaluated, nothing is eliminated, you manually choose. The autoresearch loop evaluates each variant against a scorecard, keeps only the one that beats the baseline, and uses that winner as the starting point for the next round. Each iteration builds on the best previous result. The output isn't a flat list — it's the best version after N rounds of keep-or-revert selection.

How much does it cost to run?

A 10-variant loop on Claude Sonnet 4.6 (at $3/$15 per million input/output tokens) costs roughly $0.10–$0.40 depending on asset length. With prompt caching, costs drop 60–80%. Running on a local model via Ollama costs $0 for non-training work like copy and headline optimization. The Autoresearch Playbook templates include a hard cost-stop parameter so a loop can never surprise-bill you.

What's the Autoresearch Playbook?

A 29-page guide plus 12 ready-to-run program.md templates — each one pre-built for a specific marketing or sales optimization task (landing page headlines, cold email sequences, ad copy, CTA copy, and more). The templates encode the scoring criteria for each use case so you don't build the rubric from scratch. $97 one-time, 14-day money-back guarantee. See what's in it →

What is autoresearch? The AI optimization loop explained for marketers.