Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Problem Solving and Unsolvability Detection on Hitori
Loading...
98
Accuracy (Solvable)
Gemini-3
2.32
27.16
52
76.84
Dec 1, 2025
Accuracy (Solvable)
Detection Rate (Unsolvable)
Mean Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy (Solvable)
Detection Rate (Unsolvable)
Mean Score
Gemini-3
Model Scale=3
2025.12
98
100
99
Deepseek-V3.2-R
Model Scale=V3.2-R
2025.12
70
86
78
Qwen3-4B + UnsolvableRL
Model Scale=4B, Traini...
2025.12
63.5
94.5
79
Qwen3-4B Instruct
Model Scale=4B, Traini...
2025.12
34.5
6.5
20.5
Qwen3-1.7B + UnsolvableRL
Model Scale=1.7B, Trai...
2025.12
11
59
35
Qwen3-1.7B Instruct
Model Scale=1.7B, Trai...
2025.12
8
21
14.5
GPT-5.1-Low
Model Scale=5.1-Low
2025.12
6
12
9
Feedback
Search any
task
Search any
task