Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

About

We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a $O(\sqrt{TA \Upsilon_T\log(T)})$ regret in $T$ rounds on some "easy" instances, where A is the number of arms and $\Upsilon_T$ the number of change-points, without prior knowledge of $\Upsilon_T$. In contrast with recently proposed algorithms that are agnostic to $\Upsilon_T$, we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances.

Lilian Besson, Emilie Kaufmann, Odalric-Ambrym Maillard, Julien Seznec• 2019

Related benchmarks

TaskDatasetResultRank
Multi-Armed BanditYahoo! Day 2
Average Time (s)560
7
Multi-Armed BanditYahoo! Day 3
Average Time (s)613
7
Multi-Armed BanditYahoo! Day 4
Avg Time (s)683
7
Multi-Armed BanditYahoo! Day 5
Average Latency (s)2.42e+3
7
Multi-Armed BanditYahoo! Day 6
Average Computational Time (s)707
7
Multi-Armed BanditYahoo! Day 7
Average Computational Time (s)1.53e+3
7
Multi-Armed BanditYahoo! Day 8
Average Latency (s)957
7
Multi-Armed BanditYahoo! dataset Day 9
Average Computational Time (s)971
7
Showing 8 of 8 rows

Other info

Follow for update