Vector Optimization with Stochastic Bandit Feedback

About

We introduce vector optimization problems with stochastic bandit feedback, in which preferences among designs are encoded by a polyhedral ordering cone $C$. Our setup generalizes the best arm identification problem to vector-valued rewards by extending the concept of Pareto set beyond multi-objective optimization. We characterize the sample complexity of ($\epsilon,\delta$)-PAC Pareto set identification by defining a new cone-dependent notion of complexity, called the ordering complexity. In particular, we provide gap-dependent and worst-case lower bounds on the sample complexity and show that, in the worst-case, the sample complexity scales with the square of ordering complexity. Furthermore, we investigate the sample complexity of the na\"ive elimination algorithm and prove that it nearly matches the worst-case sample complexity. Finally, we run experiments to verify our theoretical results and illustrate how $C$ and sampling budget affect the Pareto set, the returned ($\epsilon,\delta$)-PAC Pareto set, and the success of identification.

\c{C}a\u{g}{\i}n Ararat, Cem Tekin• 2021

Related benchmarks

Task	Dataset	Result
Vector Optimization (Acute Cone)	VS	Epsilon-F10.93	3
Vector Optimization (Obtuse Cone)	VS	Epsilon-F190	3
Vector Optimization (Right Cone)	VS	Epsilon-F195	3
Vector Optimization (Right Cone)	SnAr	epsilon-F189	3
Vector Optimization (Obtuse Cone)	BC	Epsilon-F198	3
Vector Optimization (Acute Cone)	BC	Epsilon-F189	3
Vector Optimization (Acute Cone)	LAC	Epsilon-F195	3
Vector Optimization (Acute Cone)	SnAr	Epsilon-F189	3
Vector Optimization (Obtuse Cone)	LAC	Epsilon-F1 Score93	3
Vector Optimization (Obtuse Cone)	SnAr	epsilon-F186	3

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord