Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Error Detection on CoSPlan Maze-E
Loading...
0.403
Accuracy
GPT-4o
0.05044
0.14197
0.2335
0.32503
Dec 11, 2025
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
GPT-4o
Reasoning Strategy=Cha...
2025.12
0.403
GPT-4o
Reasoning Strategy=Sce...
2025.12
0.353
Intern-VLM
Reasoning Strategy=Sce...
2025.12
0.334
Intern-VLM
Reasoning Strategy=Cha...
2025.12
0.331
Intern-VLM
Reasoning Strategy=Van...
2025.12
0.328
Random
2025.12
0.261
Janus-pro-7B
Reasoning Strategy=Sce...
2025.12
0.21
Qwen2 VL-8B
Reasoning Strategy=Cha...
2025.12
0.208
Qwen2 VL-8B
Reasoning Strategy=Sce...
2025.12
0.207
Qwen2 VL-8B
Reasoning Strategy=Van...
2025.12
0.205
Janus-pro-7B
Reasoning Strategy=Van...
2025.12
0.205
Janus-pro-7B
Reasoning Strategy=Cha...
2025.12
0.191
CoG-VLM
Reasoning Strategy=Sce...
2025.12
0.133
CoG-VLM
Reasoning Strategy=Cha...
2025.12
0.084
CoG-VLM
Reasoning Strategy=Van...
2025.12
0.064
Feedback
Search any
task
Search any
task