Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Frozen Lake

Benchmarks

Task NameDataset NameSOTA ResultTrend
Planning in ACNO-MDPsFrozen Lake 8x8
Cumulative Reward3.53
20
Planning in ACNO-MDPsFrozen Lake 4x4 Hard
Reward Value8.95
20
Planning in ACNO-MDPsFrozen Lake 4x4 Default
Total Reward62.42
20
Logical ReasoningFrozen Lake
Pass@1 Success Rate69
10
Maze SolvingFrozen Lake
Success Rate (pass@2, 4x4)98.7
10
Multi-modal SLAMFrozen Lake
ATE1.985
9
Visual NavigationFrozen-Lake 8x8 Grid
Accuracy12.5
9
Visual NavigationFrozen-Lake 6x6 Grid
Accuracy25
9
Visual NavigationFrozen-Lake 4x4 Grid
Accuracy55
9
Policy SelectionFrozen Lake Random 1
Max Score0.37
8
Downstream TaskFrozen Lake standard_4x4
Total Reward1
4
Source Task PerformanceFrozen Lake standard_4x4
Critical State Safety Rate100
4
Text GameFrozen Lake (test)
Accuracy38.3
4
Generating CFMDPsFrozen Lake
Mean Execution Time (s)0.398
2
Counterfactual Policy EvaluationFrozen Lake Catastrophic Path
Lowest Cumulative Reward-87
2
Counterfactual Policy EvaluationFrozen Lake Almost Catastrophic
Lowest Cumulative Reward-68
2
Counterfactual Policy EvaluationFrozen Lake Slightly Suboptimal Path
Lowest Cumulative Reward41
2
Counterfactual Policy EvaluationFrozen Lake
Average Worst-Case V(s0)37.3
2
Frozen LakeFrozen Lake 128 samples (test)
Accuracy6.3
1
Showing 19 of 19 rows