Set a metric. Walk away. Let the agent optimize overnight.

Karpathy runs 100 experiments while he sleeps. One engineer beat off-the-shelf compression tools for $40. Here is the exact playbook to run this on any metric you care about.

Jul 03, 2026

∙ Paid

There is a new way to work, and it fits in one sentence:

Pick a number, bound it with constraints, and let an agent push it while you sleep.

Karpathy calls it autoresearch. His repo gives an agent one file, one metric, and a fixed 5-minute budget per experiment. The agent edits, trains, keeps what improves, reverts what fails, and loops. Roughly 12 experiments an hour, about 100 overnight. Shopify’s CEO woke up to a model that beat his hand-tuned baseline. Karpathy’s own agent caught a bug he had missed for months.

His line is the one to keep:

“Any metric you care about that is reasonably efficient to evaluate can be autoresearched by an agent swarm. It’s worth thinking about whether your problem falls into this bucket too.”

And this pattern extends past machine learning. One engineer pointed it at file compression with Claude Code: 10 unsupervised iterations at about $4 each, and the home-built algorithm beat common tools on audio and video. Zero ML involved. Just a metric, two constraints, and a loop.

That is the whole trick, and it remains a rare skill. The gap between reading about loops and shipping one is a handful of decisions most people get wrong on the first try: which metric, which constraints, which loop mechanism, and how to stop the agent from gaming you.

Behind the paywall, the full system:

▫️ The 5 design decisions that make or break a loop, and the defaults that work
▫️ The metric-picker framework, including the silent trap that bent the compression experiment
▫️ The 3 copy-paste prompts: the harness builder, the iteration prompt, and the metric auditor
▫️ The tooling menu, autoresearch vs Ralph loops vs /goal, /loop, and /batch, and when each wins
▫️ The constraint patterns that stop reward hacking before it starts
▫️ The cost math, what a loop costs per iteration and when the ROI turns positive
▫️ The business translation, running this on conversion, latency, and content metrics with slow feedback
▫️ The skip list, the problems where loops burn money and a human wins

One subscription unlocks every playbook

This is one system in a growing library. Premium opens all of them:

▫️ Loop engineering for coding agents

▫️ The Claude managed agents guide

▫️ The AI agent reliability playbook

Plus a fresh build every week. One overnight loop that lands a win pays the subscription back on the first run.

🔁 The Autoresearch Playbook

The 5 design decisions, the metric framework, the 3 prompts, the tooling menu, the anti-gaming constraints, and the cost math, in one system you can launch tonight.

Get 50% off forever

Try premium free for 7 days. Or get 50% off this week only.

Get The Autoresearch Playbook below 👇

Keep reading with a 7-day free trial

Subscribe to The AI Corner to keep reading this post and get 7 days of free access to the full post archives.