Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Problem Solving and Unsolvability Detection on Maze Hard
Loading...
98
Solvable Accuracy
Deepseek-V3.2-R
-3.92
22.54
49
75.46
Dec 1, 2025
Solvable Accuracy
Unsolvable Detection Rate
Mean Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Solvable Accuracy
Unsolvable Detection Rate
Mean Score
Deepseek-V3.2-R
Model Scale=V3.2-R
2025.12
98
98
98
Gemini-3
Model Scale=3
2025.12
96
91
93.5
GPT-5.1-Low
Model Scale=5.1-Low
2025.12
86
94
90
Qwen3-4B + UnsolvableRL
Model Scale=4B, Traini...
2025.12
65.8
89
77.9
Qwen3-4B Instruct
Model Scale=4B, Traini...
2025.12
13.3
64.5
38.9
Qwen3-1.7B Instruct
Model Scale=1.7B, Trai...
2025.12
0
80
40
Qwen3-1.7B + UnsolvableRL
Model Scale=1.7B, Trai...
2025.12
0
79
39.5
Feedback
Search any
task
Search any
task