Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Problem Solving and Unsolvability Detection on Overall
Loading...
97.4
Solvable Accuracy
Gemini-3
20.024
40.112
60.2
80.288
Dec 1, 2025
Solvable Accuracy
Unsolvable Detection Rate
Overall Mean Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Solvable Accuracy
Unsolvable Detection Rate
Overall Mean Score
Gemini-3
Model Scale=3
2025.12
97.4
84.1
90.8
Deepseek-V3.2-R
Model Scale=V3.2-R
2025.12
88.4
84.4
86.1
Qwen3-4B + UnsolvableRL
Model Scale=4B, Traini...
2025.12
69.4
87.5
78.6
GPT-5.1-Low
Model Scale=5.1-Low
2025.12
45.9
66.6
56.2
Qwen3-4B Instruct
Model Scale=4B, Traini...
2025.12
43.4
38.8
41.1
Qwen3-1.7B + UnsolvableRL
Model Scale=1.7B, Trai...
2025.12
25.5
76.4
50.9
Qwen3-1.7B Instruct
Model Scale=1.7B, Trai...
2025.12
23
41.7
32.4
Feedback
Search any
task
Search any
task