Share your thoughts, 1 month free Claude Pro on usSee more

Policy Optimization on Multi-Armed Bandits

-7Sample Complexity

Log-barrier

Updated 4mo ago

Evaluation Results

Method	Links
Log-barrier 2026.03		-7
Log-barrier, Clipping 2026.03		-6
Log-barrier, Momentum 2026.03		-4.5
Vanilla PG 2026.03		-3
Entropy regularization 2026.03		-2
Entropy regularized NPG 2026.03		-2
Vanilla PG (SGB) 2026.03		-1
Log-barrier 2026.03		-1