Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Parallel-mentoring for Offline Model-based Optimization

About

We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose \textit{parallel-mentoring} as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, \textit{voting-based pairwise supervision}, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an \textit{adaptive soft-labeling} module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.

Can Chen, Christopher Beckham, Zixuan Liu, Xue Liu, Christopher Pal• 2023

Related benchmarks

TaskDatasetResultRank
Offline Black-box OptimizationTF8
Normalized Median Score60.9
25
Offline Black-box OptimizationTF10
Normalized Median Score0.527
25
Offline Black-box OptimizationAnt
Normalized Median Score0.606
25
Offline Black-box OptimizationD'Kitty
Normalized Median Score0.866
25
Offline Black-box OptimizationSuperC
Normalized Median Score35.5
25
Offline Black-box OptimizationLLM-DM
Normalized Median Score74
25
Offline Black-box OptimizationOverall Task Suite SuperC, Ant, D’Kitty, LLM-DM, TF8, TF10
Mean Rank11.5
24
Offline Black-box OptimizationDesign-bench 100-th percentile
TFBIND8 Score97
20
Discrete OptimizationTF Bind 8
Median Normalized Score60.9
16
Offline Model-Based OptimizationAnt Morphology (test)
Median Normalized Score0.606
16
Showing 10 of 22 rows

Other info

Follow for update