Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Problem Solving and Unsolvability Detection on Maze Easy
Loading...
100
Accuracy (Solvable)
Deepseek-V3.2-R
-4
23
50
77
Dec 1, 2025
Accuracy (Solvable)
Detection Rate (Unsolvable)
Overall Mean Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy (Solvable)
Detection Rate (Unsolvable)
Overall Mean Score
Deepseek-V3.2-R
Model Scale=V3.2-R
2025.12
100
98.9
99.5
Gemini-3
Model Scale=3
2025.12
99
94.6
96.8
Qwen3-4B + UnsolvableRL
Model Scale=4B, Traini...
2025.12
96.5
98.9
97.7
GPT-5.1-Low
Model Scale=5.1-Low
2025.12
95
100
97.5
Qwen3-4B Instruct
Model Scale=4B, Traini...
2025.12
32.7
55.6
44.1
Qwen3-1.7B Instruct
Model Scale=1.7B, Trai...
2025.12
0
85.6
42.8
Qwen3-1.7B + UnsolvableRL
Model Scale=1.7B, Trai...
2025.12
0
95.2
47.6
Feedback
Search any
task
Search any
task