Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

An Information-Theoretic Analysis of Thompson Sampling

About

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.

Daniel Russo, Benjamin Van Roy• 2014

Related benchmarks

TaskDatasetResultRank
Cumulative regret minimization5-FU clinical dosing simulation N=12 cycles
Cumulative Regret5.89
15
Showing 1 of 1 rows

Other info

Follow for update