Bilevel Autoresearch: When the Optimization Loop Optimizes Itself
Yaonan Qu
PAPER · v1.0 · 2026-03-23 · ai
Abstract
Autoresearch systems improve task outputs through iterative propose-evaluate-keep cycles, but treat the search mechanism itself as fixed. We introduce bilevel autoresearch, where an outer loop autonomously discovers and code-generates new search mechanisms for the inner loop. On the GPT pretraining benchmark (val_bpb), we compare three groups using the same LLM (DeepSeek): Level 1 (standard autoresearch), Level 1+1.5 (with outer loop config adjustment), and Level 1+1.5+Level 2 (with autonomous mechanism discovery). Across three independent repeats, Level 2 achieves 5x larger improvements than Level 1 alone (-0.045 vs -0.009 val_bpb). The Level 2 agent independently invented mechanisms from combinatorial optimization (Tabu Search), online learning (Multi-Scale Bandit), and experimental design (Systematic Orthogonal Exploration). Our results demonstrate that autoresearch can research itself.