ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

About

We introduce ShinkaEvolve: a new open-source framework leveraging large language models (LLMs) to advance scientific discovery with state-of-the-art performance and unprecedented efficiency. Recent advances in scaling inference time compute of LLMs have enabled significant progress in generalized scientific discovery. These approaches rely on evolutionary agentic harnesses that leverage LLMs as mutation operators to generate candidate solutions. However, current code evolution methods suffer from critical limitations: they are sample inefficient, requiring thousands of samples to identify effective solutions, and remain closed-source, hindering broad adoption and extension. ShinkaEvolve addresses these limitations, introducing three key innovations: a parent sampling technique balancing exploration and exploitation, code novelty rejection-sampling for efficient search space exploration, and a bandit-based LLM ensemble selection strategy. We evaluate ShinkaEvolve across diverse tasks, demonstrating consistent improvements in sample efficiency and solution quality. ShinkaEvolve discovers a new state-of-the-art circle packing solution using only 150 samples, designs high-performing agentic harnesses for AIME mathematical reasoning tasks, identifies improvements to ALE-Bench competitive programming solutions, and discovers novel mixture-of-expert load balancing loss functions that illuminate the space of optimization strategies. Our results demonstrate that ShinkaEvolve achieves broad applicability with exceptional sample efficiency. By providing open-source accessibility and cost-efficiency, this work democratizes open-ended discovery across diverse computational problems.

Robert Tjarko Lange, Yuki Imajuku, Edoardo Cetin• 2025

Related benchmarks

Task	Dataset	Result
Min/Max Distance	AlphaEvolve Min Max Distance (MMD, n=16)	Generations210	52
Circle packing	AlphaEvolve Circle Packing n=26	Generation Count146	48
Aerodynamic Shape Optimization	ShapeBench All tasks	Median Normalized Rank0.8	35
Kernel Optimization	KernelBench 1.0 (test)	Latency (us)0.476	27
Circle packing	Circle Packing (n=26)	Sum of Radii2.636	25
Geometric Optimization	CP	Fitness Score0.9986	21
Geometric Optimization	MMD	Fitness Score99.24	21
Math Optimization	Circle Packing Rect	Best Value2.3658	20
Auto-correlation Inequality Minimization	ThirdAutoCorrIneq	Best Score1.4614	18
MMD	MMD	Generation Score71	17

Showing 10 of 77 rows

...

Other info

Follow for update

@wizwand_team Discord