Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Aggregated Logical Reasoning on Overall Solvable
Loading...
67.3
Accuracy
Deepseek-V3.2-R
0.74
18.02
35.3
52.58
Dec 1, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Deepseek-V3.2-R
Model=Deepseek-V3.2-R
2025.12
67.3
Gemini-3.0-Pro
Model=Gemini-3.0-Pro
2025.12
54.3
GPT-5.1-Low
Model=GPT-5.1-Low
2025.12
38.5
Qwen3-4B-Instruct + UnsolRL-Final
Base Model=Qwen3-4B-In...
2025.12
17.4
Qwen3-4B-Instruct
Model=Qwen3-4B-Instruct
2025.12
11.8
Qwen3-1.7B-Instruct
Model=Qwen3-1.7B-Instruct
2025.12
4.9
Qwen3-1.7B-Instruct + UnsolRL-Final
Base Model=Qwen3-1.7B-...
2025.12
3.3
Feedback
Search any
task
Search any
task