| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Planning in ACNO-MDPs | Frozen Lake 8x8 | Cumulative Reward3.53 | 20 | |
| Planning in ACNO-MDPs | Frozen Lake 4x4 Hard | Reward Value8.95 | 20 | |
| Planning in ACNO-MDPs | Frozen Lake 4x4 Default | Total Reward62.42 | 20 | |
| Maze Solving | Frozen Lake | Success Rate (pass@2, 4x4)98.7 | 10 | |
| Downstream Task | Frozen Lake standard_4x4 | Total Reward1 | 4 | |
| Source Task Performance | Frozen Lake standard_4x4 | Critical State Safety Rate100 | 4 | |
| Text Game | Frozen Lake (test) | Accuracy38.3 | 4 | |
| Generating CFMDPs | Frozen Lake | Mean Execution Time (s)0.398 | 2 | |
| Counterfactual Policy Evaluation | Frozen Lake Catastrophic Path | Lowest Cumulative Reward-87 | 2 | |
| Counterfactual Policy Evaluation | Frozen Lake Almost Catastrophic | Lowest Cumulative Reward-68 | 2 | |
| Counterfactual Policy Evaluation | Frozen Lake Slightly Suboptimal Path | Lowest Cumulative Reward41 | 2 | |
| Counterfactual Policy Evaluation | Frozen Lake | Average Worst-Case V(s0)37.3 | 2 | |
| Frozen Lake | Frozen Lake 128 samples (test) | Accuracy6.3 | 1 |