On Kernelized Multi-armed Bandits

About

We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward function belongs to the reproducing kernel Hilbert space (RKHS) that naturally corresponds to a Gaussian process kernel used as input by the algorithms. Along the way, we derive a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension. Finally, experimental evaluation and comparisons to existing algorithms on synthetic and real-world environments are carried out that highlight the favorable gains of the proposed strategies in many cases.

Sayak Ray Chowdhury, Aditya Gopalan• 2017

Related benchmarks

Task	Dataset	Result
Language Understanding	MMLU	Accuracy63.9	844
Question Answering	ARC (test)	Accuracy63.4	153
Question Answering	CommonsenseQA	Accuracy81.1	150
Question Answering	TruthfulQA	Accuracy76.1	73
Question Answering	TriviaQA Gen (test)	Accuracy74.7	31
Bayesian Optimization	50 optimization problems COCO, BoTorch, Bayesmark (aggregated)	Mean RP2.07	26
Bayesian Optimization	Rosenbrock-NS synthetic (test)	Computation Time (s)2.69e+3	5
Active hit discovery	Schmidt IFN-γ	CHR@10326	5
Active hit discovery	Sanchez Tau	CHR@10377.4	5
Active hit discovery	Zhu SARS-CoV-2	CHR@10261	5

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord