What is autoresearch? The AI optimization loop explained for marketers.
Autoresearch is the AI-driven keep-or-revert loop that went viral in March 2026. Here's what it actually is, why it spread to 85,000+ GitHub stars, and how marketers — not machine learning engineers — are running it to improve landing pages, email sequences, and ad copy without touching a line of Python.
- Autoresearch is a method: propose a change, measure it against a scorecard, keep it if it wins, revert if it doesn't, then repeat automatically.
- Andrej Karpathy's open-source repo landed on GitHub on March 6, 2026 and has since passed 85,000 stars — one of the fastest-trending repos of the year.
- The original framing was about ML model training. The marketing application is different: you're optimizing copy, layouts, and sequences — not model weights.
- You don't need a developer, a GPU, or traffic splits. You need a plain-English instruction file and an AI coding agent.
- The Autoresearch Playbook translates the loop into 12 ready-to-run templates for marketing and sales use cases.
What is autoresearch?
Autoresearch is an AI optimization loop in which an agent repeatedly proposes a change, measures it against a scorecard, keeps the change only if it beats the current baseline, and reverts it if it doesn't — then repeats automatically. It was open-sourced by AI researcher Andrej Karpathy in March 2026 and went viral as a legible, hands-off way to let an AI improve its own work. Marketers apply the same loop to copy and layouts instead of model code.
Mechanically, the loop has four steps: an AI agent proposes a change to something — a headline, a call-to-action button, an email subject line, a landing page section. A second pass evaluates that change against a defined scorecard. If the change scores better than the current baseline, it's kept. If it doesn't, it's reverted. Then the loop runs again with a new proposal.
That's the whole method. The "research" part isn't literature review or user surveys — it's the AI generating, testing, and scoring its own proposals in rapid sequence, with a hard keep-or-revert gate after each one. The loop runs until you hit a budget limit, a time limit, or a score threshold.
What makes it different from regular A/B testing is that there's no traffic split, no waiting for statistical significance, and no external testing platform required. The evaluation happens computationally — either by a second AI scoring the output against your criteria, or by a measurable proxy like page load time, word count within a range, or a rule-based readability check. For some use cases (conversion rate, click rate), you still need real traffic to validate — but the autoresearch loop lets you narrow the variant space dramatically before you run a single live experiment.
How does the autoresearch loop work?
The autoresearch loop works by running a four-step cycle — propose, measure, keep or revert, repeat — over and over until it hits a limit. The core mechanic has a name you've probably seen in version control: keep or revert. Here's how it runs for a marketing use case.
| Step | What happens | Who does it |
|---|---|---|
| 1. Propose | The AI reads your current headline and your program.md instruction file, then generates a variant | AI agent (e.g. Claude Code) |
| 2. Measure | A scorecard evaluates the variant: clarity score, keyword presence, character count, or a second AI judge rating it against criteria | Evaluator (AI or rule-based) |
| 3. Keep or revert | If the variant beats the baseline score, it becomes the new baseline. If not, the original is restored. | Automated decision rule |
| 4. Repeat | The loop runs again with the updated (or restored) baseline, up to your defined limit | Loop controller |
The instruction file — sometimes called program.md — is the most important piece. It tells the agent what to optimize, what the scoring criteria are, what the limits are (max cost, max iterations), and what "better" looks like. Without a well-written instruction file, the loop produces random variation. With one, it produces consistent, directional improvement.
The Autoresearch Playbook's 12 templates are essentially pre-written instruction files for the most common marketing optimization tasks: landing page headlines, CTA copy, email subject lines, cold email sequences, and more. Each template encodes the right scoring criteria for that use case, so you're not building the scorecard from scratch.
For the full technical breakdown of how the loop is structured and what makes a good program.md, see the complete guide to how the autoresearch loop works.
A 12-point check, one honest score — sent straight to your inbox. No card.
Take the free assessmentCan marketers use autoresearch without coding?
Yes — marketers can run autoresearch without writing any code. The marketing application looks nothing like the ML research version: you're not training model weights, running CUDA kernels, or touching Python. You're optimizing assets you already own — and the AI agent does the editing work, not you.
The three areas where marketers see the most immediate value are landing pages, email sequences, and ad copy. Here's what the loop looks like in each.
Landing page optimization
You point the loop at your landing page's headline section, hero copy, or call-to-action. The agent generates variants — different word choices, different lengths, different emphasis — and scores each one against criteria you specify: does it contain the primary keyword? Is it under 12 words? Does it pass a clarity test? The loop keeps the version with the highest score and generates the next variant from that baseline. After 10-20 iterations, you have a headline that's been systematically tightened against your criteria, not just one round of "let me try a few ideas."
The practical advantage over traditional A/B testing: you can run 20 iterations before lunch and have a shortlist of 3-4 genuinely different candidates to put into a live A/B test. Traditional A/B platforms require you to generate variants manually, set up traffic splits, wait for significance, and repeat. The autoresearch loop compresses the ideation-and-filtering phase from days to hours.
Email sequence optimization
Subject lines and email bodies are ideal autoresearch targets because the scoring criteria are measurable and explicit: open rate proxies (power words, urgency signals, personalization tokens), body length, link placement, tone consistency. A loop on a cold email sequence can run 30 variants of a subject line in under an hour, scoring each one against a multi-criteria rubric, before you pick the top candidates to send in a real split test.
Ad copy
For paid search and social ads, the autoresearch loop is particularly powerful because the character limits and structural requirements (headline 1, headline 2, description) create a well-defined scoring surface. The agent proposes, scores against character limits and keyword density requirements, keeps the variants that pass, and generates the next round. What typically takes a copywriter an afternoon — brainstorming and filtering 50 ad variations — can be done in a 15-minute loop run.
What you actually need to start
You need three things: a defined asset to optimize (a page, an email, an ad), a scoring rubric for that asset (what does "better" look like, as specific criteria), and a way to run the loop (Claude Code or Cursor pointed at your instruction file). No developer, no traffic split, no A/B platform subscription required. The Autoresearch Playbook provides the instruction files — pre-built for 12 marketing use cases — so you don't have to write the scoring rubric from scratch.
Frequently asked questions
Is autoresearch the same as the Karpathy loop?
"Karpathy loop" became a popular shorthand after the March 2026 virality. The underlying method — propose, measure, keep or revert, repeat — is what people mean when they use either term. This content is independently created and is not endorsed by or affiliated with Andrej Karpathy.
Do I need to know how to code to run autoresearch?
No. You write a plain-English instruction file (the program.md), install an AI coding agent like Claude Code, and point it at your file. The agent handles the iteration. You review the output. No Python, no APIs, no dev environment required beyond the agent install.
How is autoresearch different from asking ChatGPT for 10 variations?
When you ask for 10 variations, you get a flat list — nothing is evaluated, nothing is eliminated, you manually choose. The autoresearch loop evaluates each variant against a scorecard, keeps only the one that beats the baseline, and uses that winner as the starting point for the next round. Each iteration builds on the best previous result. The output isn't a flat list — it's the best version after N rounds of keep-or-revert selection.
How much does it cost to run?
A 10-variant loop on Claude Sonnet 4.6 (at $3/$15 per million input/output tokens) costs roughly $0.10–$0.40 depending on asset length. With prompt caching, costs drop 60–80%. Running on a local model via Ollama costs $0 for non-training work like copy and headline optimization. The Autoresearch Playbook templates include a hard cost-stop parameter so a loop can never surprise-bill you.
What's the Autoresearch Playbook?
A 29-page guide plus 12 ready-to-run program.md templates — each one pre-built for a specific marketing or sales optimization task (landing page headlines, cold email sequences, ad copy, CTA copy, and more). The templates encode the scoring criteria for each use case so you don't build the rubric from scratch. $97 one-time, 14-day money-back guarantee. See what's in it →