Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Problem Solving and Unsolvability Detection on Overall
Loading...
97.4
Solvable Accuracy
Gemini-3
20.024
40.112
60.2
80.288
Dec 1, 2025
Solvable Accuracy
Unsolvable Detection Rate
Overall Mean Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Solvable Accuracy
Unsolvable Detection Rate
Overall Mean Score
Gemini-3
Model Scale=3
2025.12
97.4
84.1
90.8
Deepseek-V3.2-R
Model Scale=V3.2-R
2025.12
88.4
84.4
86.1
Qwen3-4B + UnsolvableRL
Model Scale=4B, Traini...
2025.12
69.4
87.5
78.6
GPT-5.1-Low
Model Scale=5.1-Low
2025.12
45.9
66.6
56.2
Qwen3-4B Instruct
Model Scale=4B, Traini...
2025.12
43.4
38.8
41.1
Qwen3-1.7B + UnsolvableRL
Model Scale=1.7B, Trai...
2025.12
25.5
76.4
50.9
Qwen3-1.7B Instruct
Model Scale=1.7B, Trai...
2025.12
23
41.7
32.4
Feedback
Search any
task
Search any
task