Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LEVI: Stronger Search Architectures Can Substitute for Larger LLMs in Evolutionary Search

About

LLM-guided evolutionary methods such as AlphaEvolve have proven effective in domains like math, systems research, and algorithmic discovery, but their reliance on frontier models makes each run expensive. We argue this is largely an artifact of how existing frameworks allocate search: archives that fail to preserve solution diversity force compensation through stronger mutation models; blind model use spends frontier dollars on local edits a smaller model could handle; and full-set evaluation wastes rollouts on redundant examples. We introduce LEVI, a harness-first evolutionary framework built on the bet that stronger search architectures can substitute for or even outperform larger LLMs in evolutionary search. LEVI improves on three core components of evolutionary search: a solution database that establishes diversity from the beginning, and then maintains it throughout the run; a smarter mutation router that plays into the strengths of large and small LLMs; and a rank-preserving proxy benchmark for rollout-heavy settings. Across systems-research benchmarks LEVI attains the highest score on a budget 3.3-6.7x smaller than the published frontier-model runs of existing frameworks like ShinkaEvolve, GEPA, and AdaEvolve; on one problem, LEVI matches the existing best at a 35x lower cost. On prompt optimization, LEVI matches or exceeds GEPA at less than half of its rollout budget on four different benchmarks. LEVI is available as an open-source framework at https://github.com/ttanv/levi.

Temoor Tanveer• 2026

Related benchmarks

TaskDatasetResultRank
Combinatorial scheduling optimizationTxn Sched.
Performance Score4.46e+3
6
Database optimizationLLM-SQL
Performance Score0.7985
6
LLM serving optimizationEPLB
Performance Score15.23
6
Networking optimizationCloudcast
Performance Score578.1
6
Distributed systems optimizationPRISM
Performance Score26.26
6
Distributed systems optimizationSpot-M
Performance Score72.4
4
Distributed systems optimizationSpot S
Performance Score51.7
4
Prompt OptimizationHotpotQA
Score63
4
Prompt OptimizationIFBench
Score46.33
4
Prompt OptimizationGEPA Evaluation Suite Aggregate
Aggregate Score62.02
4
Showing 10 of 12 rows

Other info

Follow for update