Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Planning on 8x8 two-room gridworld (test)
Loading...
0.89
Validity (%)
L-ICL
0.1308
0.3279
0.525
0.7221
Jan 30, 2026
Validity (%)
Success Rate (%)
Optimality (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Validity (%)
Success Rate (%)
Optimality (%)
L-ICL
Base Model=DeepSeek V3...
2026.01
0.89
0.89
0.77
Self-Consistency
Base Model=DeepSeek V3...
2026.01
0.59
0.45
0.43
Self-Refine
Base Model=DeepSeek V3...
2026.01
0.51
0.44
0.38
ReAct
Base Model=DeepSeek V3...
2026.01
0.48
0.41
0.37
PTP
Base Model=DeepSeek V3...
2026.01
0.4
0.33
0.28
RAG-ICL
Base Model=DeepSeek V3...
2026.01
0.21
0.09
0.09
RAG-ICL
Base Model=DeepSeek V3...
2026.01
0.2
0.06
0.06
Zero-Shot
Base Model=DeepSeek V3...
2026.01
0.16
0
0
Feedback
Search any
task
Search any
task