Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-Armed Bandits with Correlated Arms

About

We consider a multi-armed bandit framework where the rewards obtained by pulling different arms are correlated. We develop a unified approach to leverage these reward correlations and present fundamental generalizations of classic bandit algorithms to the correlated setting. We present a unified proof technique to analyze the proposed algorithms. Rigorous analysis of C-UCB (the correlated bandit version of Upper-confidence-bound) reveals that the algorithm ends up pulling certain sub-optimal arms, termed as non-competitive, only O(1) times, as opposed to the O(log T) pulls required by classic bandit algorithms such as UCB, TS etc. We present regret-lower bound and show that when arms are correlated through a latent random source, our algorithms obtain order-optimal regret. We validate the proposed algorithms via experiments on the MovieLens and Goodreads datasets, and show significant improvement over classical bandit algorithms.

Samarth Gupta, Shreyas Chaudhari, Gauri Joshi, Osman Ya\u{g}an• 2019

Related benchmarks

TaskDatasetResultRank
Bi-objective Flexible Job Shop Scheduling ProblemBi-FJSP (train)
Hypervolume (HV)0.1763
16
Bi-objective Flexible Job Shop Scheduling ProblemBi-FJSP (test)
Hypervolume0.2292
16
Bi-objective Flexible Job Shop Scheduling ProblemBi-FJSP All instances
Hypervolume0.2256
16
Showing 3 of 3 rows

Other info

Follow for update