LEVI: Stronger Search Architectures Can Substitute for Larger LLMs in Evolutionary Search

About

LLM-guided evolutionary methods such as AlphaEvolve have proven effective in domains like math, systems research, and algorithmic discovery, but their reliance on frontier models makes each run expensive. We argue this is largely an artifact of how existing frameworks allocate search: archives that fail to preserve solution diversity force compensation through stronger mutation models; blind model use spends frontier dollars on local edits a smaller model could handle; and full-set evaluation wastes rollouts on redundant examples. We introduce LEVI, a harness-first evolutionary framework built on the bet that stronger search architectures can substitute for or even outperform larger LLMs in evolutionary search. LEVI improves on three core components of evolutionary search: a solution database that establishes diversity from the beginning, and then maintains it throughout the run; a smarter mutation router that plays into the strengths of large and small LLMs; and a rank-preserving proxy benchmark for rollout-heavy settings. Across systems-research benchmarks LEVI attains the highest score on a budget 3.3-6.7x smaller than the published frontier-model runs of existing frameworks like ShinkaEvolve, GEPA, and AdaEvolve; on one problem, LEVI matches the existing best at a 35x lower cost. On prompt optimization, LEVI matches or exceeds GEPA at less than half of its rollout budget on four different benchmarks. LEVI is available as an open-source framework at https://github.com/ttanv/levi.

Temoor Tanveer• 2026

Related benchmarks

Task	Dataset	Result
Combinatorial scheduling optimization	Txn Sched.	Performance Score4.46e+3	6
Database optimization	LLM-SQL	Performance Score0.7985	6
LLM serving optimization	EPLB	Performance Score15.23	6
Networking optimization	Cloudcast	Performance Score578.1	6
Distributed systems optimization	PRISM	Performance Score26.26	6
Distributed systems optimization	Spot-M	Performance Score72.4	4
Distributed systems optimization	Spot S	Performance Score51.7	4
Prompt Optimization	HotpotQA	Score63	4
Prompt Optimization	IFBench	Score46.33	4
Prompt Optimization	GEPA Evaluation Suite Aggregate	Aggregate Score62.02	4

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord