Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Policy Optimization on Multi-Armed Bandits
Loading...
-7
Sample Complexity
Log-barrier
-7.24
-5.62
-4
-2.38
Mar 16, 2026
Sample Complexity
Updated 1mo ago
Evaluation Results
Method
Method
Links
Sample Complexity
Log-barrier
Learning Rate alpha=O(...
2026.03
-7
Log-barrier, Clipping
Learning Rate alpha=O(...
2026.03
-6
Log-barrier, Momentum
Learning Rate alpha=O(...
2026.03
-4.5
Vanilla PG
Learning Rate alpha=O(...
2026.03
-3
Entropy regularization
Learning Rate alpha=O(...
2026.03
-2
Entropy regularized NPG
Learning Rate alpha=O(...
2026.03
-2
Vanilla PG (SGB)
Learning Rate alpha=O(...
2026.03
-1
Log-barrier
Learning Rate alpha=O(...
2026.03
-1
Feedback
Search any
task
Search any
task