READY: Reward Discovery for Meta-Black-Box Optimization

About

Meta-Black-Box Optimization (MetaBBO) is an emerging avenue within Optimization community, where algorithm design policy could be meta-learned by reinforcement learning to enhance optimization performance. So far, the reward functions in existing MetaBBO works are designed by human experts, introducing certain design bias and risks of reward hacking. In this paper, we use Large Language Model~(LLM) as an automated reward discovery tool for MetaBBO. Specifically, we consider both effectiveness and efficiency sides. On effectiveness side, we borrow the idea of evolution of heuristics, introducing tailored evolution paradigm in the iterative LLM-based program search process, which ensures continuous improvement. On efficiency side, we additionally introduce multi-task evolution architecture to support parallel reward discovery for diverse MetaBBO approaches. Such parallel process also benefits from knowledge sharing across tasks to accelerate convergence. Empirical results demonstrate that the reward functions discovered by our approach could be helpful for boosting existing MetaBBO works, underscoring the importance of reward design in MetaBBO. We provide READY's project at https://anonymous.4open.science/r/ICML_READY-747F.

Zechuan Huang, Zhiguang Cao, Hongshu Guo, Yue-Jiao Gong, Zeyuan Ma• 2026

Related benchmarks

Task	Dataset	Result
Meta-Black-Box Optimization (DEDQN)	BBOB (test)	Attractive Sector2.35e+3	5
Meta-Black-Box Optimization (RLDAS)	BBOB suite (test)	Attractive Sector0.128	5
Meta-Black-Box Optimization (RLEPSO)	BBOB (test)	Attractive Sector0.0041	5

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord