Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on Maze
Loading...
98
Pass@1
Kimi K2.5 + ETCHR
-3.92
22.54
49
75.46
May 22, 2026
Pass@1
Updated 9d ago
Evaluation Results
Method
Method
Links
Pass@1
Kimi K2.5 + ETCHR
Temperature=0
2026.05
98
Kimi K2.5
Temperature=0
2026.05
95.5
Gemini-3.1-Flash-Lite + ETCHR
Temperature=0
2026.05
51.5
Gemini-3.1-Flash-Lite
Temperature=0
2026.05
40
Qwen3-VL-8B + ETCHR
Temperature=0
2026.05
38.5
Qwen3-VL-8B
Temperature=0
2026.05
27.5
ThinkMorph-7B
Temperature=0, Max ima...
2026.05
6.5
DeepEyesV2
Temperature=0, Max too...
2026.05
0.5
Thyme
Temperature=0, Max too...
2026.05
0
Bagel-Zebra-CoT
Temperature=0, Max ima...
2026.05
0
Feedback
Search any
task
Search any
task