Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

About

LLM-guided evolutionary search (Evolve systems) has reached state-of-the-art results on mathematical and combinatorial tasks, yet most existing systems report only the best of many runs and leave the run-to-run distribution undocumented. We ask how a fixed budget of LLM calls should be allocated, and how reliably a single run reaches the reported numbers. Sweeping the depth-breadth grid over five models and three tasks, we identify two empirical regularities: a fitness-compute envelope along which capability ordering largely collapses on effective FLOPs, and a bilinear depth-breadth fit with task-specific interaction; both are gated by model-task capability. Motivated by these regularities, we propose BaSE (Bandit-based Self-Evolving), a multi-armed bandit that allocates LLM calls across parallel trajectories. Without changing the model, prompt, or evaluator, BaSE improves mean fitness by 12.3% over the strongest island-protocol baseline across 8 (model, task) cells, with the largest gains on high-variance settings: a reliability gain from allocation alone.

Sixue Xing, Haoyu He, Kerui Wu, Zhuo Yang, Haozheng Luo, Tianfan Fu, Aarthy Nagarajan• 2026

Related benchmarks

Task	Dataset	Result
Min/Max Distance	AlphaEvolve Min Max Distance (MMD, n=16)	Generations451	52
Circle packing	AlphaEvolve Circle Packing n=26	Generation Count336	48
Geometric Optimization	CP	Fitness Score1.0003	21
Geometric Optimization	MMD	Fitness Score99.83	21
MMD	MMD	Generation Score114	17
Geometric Optimization	HT	Fitness Score0.8736	14
CP	CP	Generation Performance Score327	13
Heilbronn Triangle	AlphaEvolve Heilbronn Triangle n=11	Generation Count60	9
HT	HT	Generation Score209	9

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord