Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

About

Discovering efficient algorithms for solving complex problems has been an outstanding challenge in mathematics and computer science, requiring substantial human expertise over the years. Recent advancements in evolutionary search with large language models (LLMs) have shown promise in accelerating the discovery of algorithms across various domains, particularly in mathematics and optimization. However, existing approaches treat the LLM as a static generator, missing the opportunity to update the model with the signal obtained from evolutionary exploration. In this work, we propose to augment LLM-based evolutionary search by continuously refining the search operator - the LLM - through reinforcement learning (RL) fine-tuning. Our method leverages evolutionary search as an exploration strategy to discover improved algorithms, while RL optimizes the LLM policy based on these discoveries. Our experiments on combinatorial optimization tasks demonstrate that integrating RL with evolutionary search accelerates the discovery of superior algorithms, showcasing the potential of RL-enhanced evolutionary strategies for algorithm design.

Anja Surina, Amin Mansouri, Lars Quaedvlieg, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, Caglar Gulcehre• 2025

Related benchmarks

Task	Dataset	Result
Bin Packing	Bin Packing (val)	Mean Optimality Gap3.12	18
Bin Packing	Bin Packing Perturbed Set (val)	Mean Optimality Gap2.66	18
Bin Packing	Bin Packing (test)	Mean Optimality Gap2.8	18
Flatpack	Flatpack (val)	Mean Optimality Gap0.105	18
Flatpack	Flatpack Perturbed (val)	Mean Optimality Gap0.099	18
Flatpack	Flatpack (test)	Mean Optimality Gap0.113	18
Traveling Salesman Problem	Traveling Salesman Problem (val)	Mean Optimality Gap2.504	18
Traveling Salesman Problem	Traveling Salesman Problem Perturbed (val)	Mean Optimality Gap2.894	18
Traveling Salesman Problem	Traveling Salesman Problem (test)	Mean Optimality Gap2.534	18
Traveling Salesperson Problem	TSPLIB eil51 ch150	Optimality Gap0.67	12

Showing 10 of 47 rows

Other info

Follow for update

@wizwand_team Discord