Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

pen

Benchmarks

Task NameDataset NameSOTA ResultTrend
Off-policy evaluation for classification errorpen
Bias-0.095
15
Offline-to-online Reinforcement Learningpen
Regret5.3
12
Reinforcement Learningpen human
Normalized Return53.4
4
Reinforcement Learningpen cloned
Normalized Return58.9
4
Showing 4 of 4 rows