Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Offline Model-Based Optimization via Policy-Guided Gradient Search

About

Offline optimization is an emerging problem in many experimental engineering domains including protein, drug or aircraft design, where online experimentation to collect evaluation data is too expensive or dangerous. To avoid that, one has to optimize an unknown function given only its offline evaluation at a fixed set of inputs. A naive solution to this problem is to learn a surrogate model of the unknown function and optimize this surrogate instead. However, such a naive optimizer is prone to erroneous overestimation of the surrogate (possibly due to over-fitting on a biased sample of function evaluation) on inputs outside the offline dataset. Prior approaches addressing this challenge have primarily focused on learning robust surrogate models. However, their search strategies are derived from the surrogate model rather than the actual offline data. To fill this important gap, we introduce a new learning-to-search perspective for offline optimization by reformulating it as an offline reinforcement learning problem. Our proposed policy-guided gradient search approach explicitly learns the best policy for a given surrogate model created from the offline data. Our empirical results on multiple benchmarks demonstrate that the learned optimization policy can be combined with existing offline surrogates to significantly improve the optimization performance.

Yassine Chemingui, Aryan Deshwal, Trong Nghia Hoang, Janardhan Rao Doppa• 2024

Related benchmarks

TaskDatasetResultRank
Offline Black-box OptimizationD'Kitty
Normalized Median Score0.941
25
Offline Black-box OptimizationSuperC
Normalized Median Score37.9
25
Offline Black-box OptimizationAnt
Normalized Median Score0.532
25
Offline Black-box OptimizationLLM-DM
Normalized Median Score58
25
Offline Black-box OptimizationTF8
Normalized Median Score37.5
25
Offline Black-box OptimizationTF10
Normalized Median Score0.443
25
Offline Black-box OptimizationOverall Task Suite SuperC, Ant, D’Kitty, LLM-DM, TF8, TF10
Mean Rank16.7
24
Offline Model-Based OptimizationD'Kitty Morphology Design-Bench
100th Percentile Score95.4
23
Offline Model-Based OptimizationAnt Morphology Design-Bench
100th Percentile Score0.715
23
Offline Model-Based OptimizationSuperconductor Design-Bench
Score (P100)44.4
22
Showing 10 of 16 rows

Other info

Follow for update