Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models
About
Automated heuristic design (AHD) has gained considerable attention for its potential to automate the development of effective heuristics. The recent advent of large language models (LLMs) has paved a new avenue for AHD, with initial efforts focusing on framing AHD as an evolutionary program search (EPS) problem. However, inconsistent benchmark settings, inadequate baselines, and a lack of detailed component analysis have left the necessity of integrating LLMs with search strategies and the true progress achieved by existing LLM-based EPS methods to be inadequately justified. This work seeks to fulfill these research queries by conducting a large-scale benchmark comprising four LLM-based EPS methods and four AHD problems across nine LLMs and five independent runs. Our extensive experiments yield meaningful insights, providing empirical grounding for the importance of evolutionary search in LLM-based AHD approaches, while also contributing to the advancement of future EPS algorithmic development. To foster accessibility and reproducibility, we have fully open-sourced our benchmark and corresponding results.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automated Heuristic Discovery | AHD Individual (Instance-wise) | Average Tardiness5.57e+3 | 28 | |
| Bi-objective Flexible Job Shop Scheduling Problem | Bi-FJSP (test) | Hypervolume0.2303 | 16 | |
| Bi-objective Flexible Job Shop Scheduling Problem | Bi-FJSP All instances | Hypervolume0.2263 | 16 | |
| Bi-objective Flexible Job Shop Scheduling Problem | Bi-FJSP (train) | Hypervolume (HV)0.1702 | 16 | |
| Automated Heuristic Discovery | AHD Instances (train) | Average Tardiness7.87e+3 | 9 | |
| Automated Heuristic Discovery | AHD Instances (All) | Average Tardiness6.01e+3 | 9 | |
| Automated Heuristic Discovery | AHD Instances (test) | Average Tardiness4.15e+3 | 9 |