Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

KL-regularized Bandits

Benchmarks

Task NameDataset NameSOTA ResultTrend
Regret MinimizationKL-regularized Bandits
Sample Complexity2
2
Regret MinimizationKL-regularized Bandits Data Coverage
Regret2
1
Regret MinimizationKL-regularized Bandits Preference w/ Linear Reward
Regret2
1
Regret MinimizationKL-regularized Bandits Eluder Dimension
Metric-
0
Regret MinimizationKL-regularized Bandits Preference w/ Eluder Dimension
Metric-
0
Showing 5 of 5 rows