Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

On Kernelized Multi-armed Bandits

About

We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward function belongs to the reproducing kernel Hilbert space (RKHS) that naturally corresponds to a Gaussian process kernel used as input by the algorithms. Along the way, we derive a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension. Finally, experimental evaluation and comparisons to existing algorithms on synthetic and real-world environments are carried out that highlight the favorable gains of the proposed strategies in many cases.

Sayak Ray Chowdhury, Aditya Gopalan• 2017

Related benchmarks

TaskDatasetResultRank
Language UnderstandingMMLU
Accuracy63.9
756
Question AnsweringCommonsenseQA
Accuracy81.1
143
Question AnsweringTruthfulQA
Accuracy76.1
73
Question AnsweringARC (test)
Accuracy63.4
67
Question AnsweringTriviaQA Gen (test)
Accuracy74.7
31
Bayesian Optimization50 optimization problems COCO, BoTorch, Bayesmark (aggregated)
Mean RP2.07
26
Bayesian OptimizationRosenbrock-NS synthetic (test)
Computation Time (s)2.69e+3
5
Bayesian OptimizationHPOBench
Computation Time (s)3.34e+3
5
Bayesian OptimizationAckley-HT synthetic (test)
Computation Time (s)2.77e+3
5
Bayesian OptimizationAckley-NS synthetic (test)
Computation Time (s)2.80e+3
5
Showing 10 of 15 rows

Other info

Follow for update