| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Empirical coverage estimation | RiverSwim | Q^π(1, 0)0.948 | 120 | |
| Empirical coverage estimation | RiverSwim T=50 90% nominal coverage | Q* (1, 0)91.3 | 20 | |
| Optimal Policy Recovery (Empirical Coverage) | RiverSwim T=50 nominal 95% coverage | Q* Recovery (s=1, a=0)95.7 | 20 | |
| State-Value coverage estimation | RiverSwim mostly-right target policy T=50 | V(s=1)0.523 | 20 | |
| Action-Value coverage estimation | RiverSwim mostly-right target policy T=50 | Q-Value Estimate (s=1, a=0)0.523 | 20 | |
| Empirical Coverage Estimation | RiverSwim episode length T = 10 (nominal 95% coverage) | Q* (1, 0)89.9 | 20 | |
| Off-Policy Evaluation | RiverSwim mostly-left policy, T=50 | Qπ(1, 0) Coverage55.7 | 20 | |
| State Value Estimation Coverage | RiverSwim | Value Estimate State 10.952 | 20 | |
| State-Action Value Estimation Coverage | RiverSwim | Q-Value Estimate (s=1, a=0)0.952 | 20 | |
| Empirical coverage estimation | RiverSwim episode length T=100 | Q* Coverage (1, 0)94.9 | 15 | |
| Empirical Coverage Estimation | RiverSwim episode length T=100, nominal coverage 50% | Q* (1, 0) Coverage0.529 | 15 | |
| State-Value Coverage Estimation | RiverSwim (T=100) | V*(1)0.907 | 15 | |
| Action-Value Coverage Estimation | RiverSwim T=100 | Q*(1,0)0.907 | 15 | |
| Offline Reinforcement Learning | RiverSwim tabular | Returns100 | 15 | |
| Reinforcement Learning | RiverSwim | Cumulative Reward3.3 | 4 |