Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Spatial Reasoning on bAbI (test)
Loading...
24
Accuracy
Self-Consistency
-0.96
5.52
12
18.48
Feb 2, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Self-Consistency
Backbone=Qwen-2.5 (14B...
2026.02
24
Self-Consistency
Backbone=Gemma-3 (27B)...
2026.02
22
Chain of Simulation (CoS)
Backbone=Gemma-3 (27B)...
2026.02
22
Chain of Simulation (CoS)
Backbone=Qwen-2.5 (14B...
2026.02
22
Direct
Backbone=Gemma-3 (27B)...
2026.02
20
Chain-of-Thought
Backbone=Gemma-3 (27B)...
2026.02
20
Direct
Backbone=Qwen-2.5 (14B...
2026.02
20
Chain-of-Thought
Backbone=Qwen-2.5 (14B...
2026.02
20
Structured CoT
Backbone=Qwen-2.5 (14B...
2026.02
20
Structured CoT
Backbone=Gemma-3 (27B)...
2026.02
18
Chain of Simulation (CoS)
Backbone=LLaMA-3.1 (8B...
2026.02
18
Chain of Simulation (CoS)
Backbone=Mistral (7B),...
2026.02
14
Direct
Backbone=LLaMA-3.1 (8B...
2026.02
2
Structured CoT
Backbone=LLaMA-3.1 (8B...
2026.02
2
Direct
Backbone=Mistral (7B),...
2026.02
2
Chain-of-Thought
Backbone=LLaMA-3.1 (8B...
2026.02
0
Self-Consistency
Backbone=LLaMA-3.1 (8B...
2026.02
0
Chain-of-Thought
Backbone=Mistral (7B),...
2026.02
0
Structured CoT
Backbone=Mistral (7B),...
2026.02
0
Self-Consistency
Backbone=Mistral (7B),...
2026.02
0
Feedback
Search any
task
Search any
task