Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Two dim reward function

Benchmarks

Task NameDataset NameSOTA ResultTrend
Regret MinimizationTwo dim reward function synthetic (test)
Oracle Regret2,589.32
9
Regret MinimizationTwo dim reward function weak adversaries Appendix A.7 (test)
Oracle Regret2,589.32
9
Showing 2 of 2 rows