Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Plan Generation on 6 grid world domains Unseen Appearances
Loading...
81.1
Success Rate (Frozenlake)
VLMFP
5.076
24.813
44.55
64.287
Oct 3, 2025
Success Rate (Frozenlake)
Success Rate (Maze)
Success Rate (Sokoban)
Success Rate (Package)
Success Rate (Printer)
Success Rate (Overcooked)
Average Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate (Frozenlake)
Success Rate (Maze)
Success Rate (Sokoban)
Success Rate (Package)
Success Rate (Printer)
Success Rate (Overcooked)
Average Success Rate
VLMFP
Planning Approach=VLMF...
2025.10
81.1
82.8
25.1
53.4
61
21.3
54.1
CodePDDL
Planning Approach=Code...
2025.10
77.1
74.9
0.4
19
22.1
0
32.3
CoT
Planning Approach=Chai...
2025.10
28
16
13
3
12
0
12
Direct
Planning Approach=Dire...
2025.10
24
15
9
3
9
0
10
CoT
Planning Approach=Chai...
2025.10
10
2
0
0
0
0
2
Direct
Planning Approach=Dire...
2025.10
8
2
0
0
0
0
1.7
Feedback
Search any
task
Search any
task