Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling

About

Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the literature assume that this prior is known but in practice this seldom holds. Instead, practitioners often rely on maximum likelihood estimation to select the hyperparameters of the prior - which lacks theoretical guarantees. In this work, we propose two algorithms for joint prior selection and regret minimization in GP bandits based on GP Thompson sampling (GP-TS): Prior-Elimination GP-TS (PE-GP-TS) that disqualifies priors with poor predictive performance, and HyperPrior GP-TS (HP-GP-TS) that utilizes a bi-level Thompson sampling scheme. We theoretically analyze the algorithms and establish upper bounds for their respective regret. In addition, we demonstrate the effectiveness of our algorithms compared to the alternatives through extensive experiments with synthetic and real-world data.

Jack Sandberg, Morteza Haghir Chehreghani• 2025

Related benchmarks

TaskDatasetResultRank
Regret MinimizationSynthetic Kernel
Average total regret39.2
10
Regret MinimizationSynthetic Subspace
Average Total Regret88.3
10
Regret MinimizationSynthetic Lengthscale
Average Total Regret31.4
10
Regret MinimizationPeMS
Average Total Regret1.21e+3
6
Regret MinimizationPNW Precip 1980-1994 (test)
Average Total Regret167.7
6
Regret MinimizationIntel
Average Total Regret54.1
6
Showing 6 of 6 rows

Other info

Follow for update