Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RiverSwim

Benchmarks

Task NameDataset NameSOTA ResultTrend
Empirical coverage estimationRiverSwim
Q^π(1, 0)0.948
120
Empirical coverage estimationRiverSwim T=50 90% nominal coverage
Q* (1, 0)91.3
20
Optimal Policy Recovery (Empirical Coverage)RiverSwim T=50 nominal 95% coverage
Q* Recovery (s=1, a=0)95.7
20
State-Value coverage estimationRiverSwim mostly-right target policy T=50
V(s=1)0.523
20
Action-Value coverage estimationRiverSwim mostly-right target policy T=50
Q-Value Estimate (s=1, a=0)0.523
20
Empirical Coverage EstimationRiverSwim episode length T = 10 (nominal 95% coverage)
Q* (1, 0)89.9
20
Off-Policy EvaluationRiverSwim mostly-left policy, T=50
Qπ(1, 0) Coverage55.7
20
State Value Estimation CoverageRiverSwim
Value Estimate State 10.952
20
State-Action Value Estimation CoverageRiverSwim
Q-Value Estimate (s=1, a=0)0.952
20
Empirical coverage estimationRiverSwim episode length T=100
Q* Coverage (1, 0)94.9
15
Empirical Coverage EstimationRiverSwim episode length T=100, nominal coverage 50%
Q* (1, 0) Coverage0.529
15
State-Value Coverage EstimationRiverSwim (T=100)
V*(1)0.907
15
Action-Value Coverage EstimationRiverSwim T=100
Q*(1,0)0.907
15
Offline Reinforcement LearningRiverSwim tabular
Returns100
15
Reinforcement LearningRiverSwim
Cumulative Reward3.3
4
Showing 15 of 15 rows