Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scientific Reasoning on Aggregate GPQA, HLE, MMLU-Pro
Loading...
44.6
Average Score
Dr.SCI-4B-think
23.696
29.123
34.55
39.977
Feb 9, 2026
Average Score
Updated 3d ago
Evaluation Results
Method
Method
Links
Average Score
Dr.SCI-4B-think
Model Category=Thinkin...
2026.02
44.6
o1-mini
Model Category=Thinkin...
2026.02
43.4
R1-0528-Qwen3-8B
Model Category=Thinkin...
2026.02
41.8
R1-Distill-Qwen-32B
Model Category=Thinkin...
2026.02
41
Dr.SCI-4B-instruct
Model Category=Instruc...
2026.02
40.2
GPT-4o
Model Category=Instruc...
2026.02
39
Qwen3-14B-MegaScience
Model Category=Instruc...
2026.02
39
Qwen3-4B thinking
Model Category=Thinkin...
2026.02
38.9
General-Reasoner-Qw3-14B
Model Category=Instruc...
2026.02
38.9
QwQ-32B
Model Category=Thinkin...
2026.02
37.4
Qwen3-8B-MegaScience
Model Category=Instruc...
2026.02
35.5
Qwen3-8B-VeriFree
Model Category=Instruc...
2026.02
34.6
Qwen3-4B-VeriFree
Model Category=Instruc...
2026.02
32.3
General-Reasoner-4B
Model Category=Instruc...
2026.02
31.6
Qwen3-4B-MegaScience
Model Category=Instruc...
2026.02
29.3
Qwen3-4B non-thinking
Model Category=Instruc...
2026.02
29.2
Qwen3-4B-Base
Model Category=Base, M...
2026.02
24.5
Feedback
Search any
task
Search any
task