Nearly-Optimal Algorithm for Adversarial Kernelized Bandits
About
This paper studies kernelized bandits (also known as Gaussian process bandits) in an adversarial environment, where the reward functions in a known reproducing kernel Hilbert space (RKHS) may be adversarially chosen at each round. We show that the exponential-weight algorithm achieves $\tilde{O}(\sqrt{T \gamma_T})$ adversarial regret, where $T$ and $\gamma_T$ denote the number of total rounds and the maximum information gain, respectively. For squared exponential (SE) and $\nu$-Mat\'ern kernels, we also show algorithm-independent lower bounds that guarantee the optimality of our algorithm up to polylogarithmic factors. Furthermore, we present a computationally efficient variant of our algorithm using Nystr\"om approximation while maintaining nearly optimal regret guarantees.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Bandit Optimization | Setting 1 | Regret14.7 | 36 | |
| Bandit Optimization | Setting 2 | Regret11.1 | 36 | |
| Bandit Optimization | Setting 3 | Per-round Time Cost (s)0.2 | 28 | |
| Bandit Optimization | Setting 4 | Per-round Time Cost (s)0.3 | 28 | |
| Kernel Bandit | Adversarial Kernel Bandit Compact input domain X subset of R^d Theoretical bounds | Regret (nu-Matern Kernel)2 | 4 |