CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design
About
Tackling complex optimization problems often relies on expert-designed heuristics, typically crafted through extensive trial and error. Recent advances demonstrate that large language models (LLMs), when integrated into well-designed evolutionary search frameworks, can autonomously discover high-performing heuristics at a fraction of the traditional cost. However, existing approaches predominantly rely on verbal guidance, i.e., manipulating the prompt generation process, to steer the evolution of heuristics, without adapting the underlying LLM. We propose a hybrid framework that combines verbal and numerical guidance, the latter achieved by fine-tuning the LLM via reinforcement learning based on the quality of generated heuristics. This joint optimization allows the LLM to co-evolve with the search process. Our method outperforms state-of-the-art (SOTA) baselines across various optimization tasks, running locally on a single 24GB GPU using a 7B model with INT4 quantization. It surpasses methods that rely solely on verbal guidance, even when those use significantly more powerful API-based models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Combinatorial Optimization | MKP-ACO | Mean Value (N=100)16.906 | 13 | |
| Traveling Salesperson Problem (Constructive) | TSP-Constructive N=100 (val) | Mean Tour Length8.852 | 13 | |
| Combinatorial Optimization | OVRP-Constructive | Mean Objective (N=50)12.83 | 13 | |
| Capacitated Vehicle Routing Problem (ACO) | CVRP-ACO N=200 (val) | Mean Objective Value30.243 | 13 | |
| Combinatorial Optimization | OP-ACO | Mean (N=50)14.989 | 13 | |
| Capacitated Vehicle Routing Problem (ACO) | CVRP-ACO N=100 (val) | Mean Objective Value16.612 | 13 | |
| Traveling Salesperson Problem (Constructive) | TSP-Constructive N=200 (val) | Mean Objective Value13.159 | 13 | |
| Traveling Salesperson Problem (ACO) | TSP-ACO N=100 (val) | Mean Tour Length8.607 | 13 | |
| Traveling Salesperson Problem (ACO) | TSP-ACO N=200 (val) | Mean Cost12.672 | 13 | |
| Capacitated Vehicle Routing Problem (Constructive) | CVRP-Constructive N=100 (val) | Mean Cost24.134 | 13 |