Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Frozen Lake

Benchmarks

Task NameDataset NameSOTA ResultTrend
Planning in ACNO-MDPsFrozen Lake 8x8
Cumulative Reward3.53
20
Planning in ACNO-MDPsFrozen Lake 4x4 Hard
Reward Value8.95
20
Planning in ACNO-MDPsFrozen Lake 4x4 Default
Total Reward62.42
20
Maze SolvingFrozen Lake
Success Rate (pass@2, 4x4)98.7
10
Downstream TaskFrozen Lake standard_4x4
Total Reward1
4
Source Task PerformanceFrozen Lake standard_4x4
Critical State Safety Rate100
4
Text GameFrozen Lake (test)
Accuracy38.3
4
Generating CFMDPsFrozen Lake
Mean Execution Time (s)0.398
2
Counterfactual Policy EvaluationFrozen Lake Catastrophic Path
Lowest Cumulative Reward-87
2
Counterfactual Policy EvaluationFrozen Lake Almost Catastrophic
Lowest Cumulative Reward-68
2
Counterfactual Policy EvaluationFrozen Lake Slightly Suboptimal Path
Lowest Cumulative Reward41
2
Counterfactual Policy EvaluationFrozen Lake
Average Worst-Case V(s0)37.3
2
Frozen LakeFrozen Lake 128 samples (test)
Accuracy6.3
1
Showing 13 of 13 rows