Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Vector Optimization with Stochastic Bandit Feedback

About

We introduce vector optimization problems with stochastic bandit feedback, in which preferences among designs are encoded by a polyhedral ordering cone $C$. Our setup generalizes the best arm identification problem to vector-valued rewards by extending the concept of Pareto set beyond multi-objective optimization. We characterize the sample complexity of ($\epsilon,\delta$)-PAC Pareto set identification by defining a new cone-dependent notion of complexity, called the ordering complexity. In particular, we provide gap-dependent and worst-case lower bounds on the sample complexity and show that, in the worst-case, the sample complexity scales with the square of ordering complexity. Furthermore, we investigate the sample complexity of the na\"ive elimination algorithm and prove that it nearly matches the worst-case sample complexity. Finally, we run experiments to verify our theoretical results and illustrate how $C$ and sampling budget affect the Pareto set, the returned ($\epsilon,\delta$)-PAC Pareto set, and the success of identification.

\c{C}a\u{g}{\i}n Ararat, Cem Tekin• 2021

Related benchmarks

TaskDatasetResultRank
Vector Optimization (Acute Cone)VS
Epsilon-F10.93
3
Vector Optimization (Obtuse Cone)VS
Epsilon-F190
3
Vector Optimization (Right Cone)VS
Epsilon-F195
3
Vector Optimization (Right Cone)SnAr
epsilon-F189
3
Vector Optimization (Obtuse Cone)BC
Epsilon-F198
3
Vector Optimization (Acute Cone)BC
Epsilon-F189
3
Vector Optimization (Acute Cone)LAC
Epsilon-F195
3
Vector Optimization (Acute Cone)SnAr
Epsilon-F189
3
Vector Optimization (Obtuse Cone)LAC
Epsilon-F1 Score93
3
Vector Optimization (Obtuse Cone)SnAr
epsilon-F186
3
Showing 10 of 12 rows

Other info

Follow for update