Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scientific Reasoning on HLE
Loading...
612
pass@1
Dr.SCI-4B-think
71.2
211.6
352
492.4
Feb 9, 2026
pass@1
avg@10
Updated 4d ago
Evaluation Results
Method
Method
Links
pass@1
avg@10
Dr.SCI-4B-think
Model Category=Thinkin...
2026.02
612
-
o1-mini
Model Category=Thinkin...
2026.02
568
-
R1-0528-Qwen3-8B
Model Category=Thinkin...
2026.02
556
-
R1-Distill-Qwen-32B
Model Category=Thinkin...
2026.02
536
-
Dr.SCI-4B-instruct
Model Category=Instruc...
2026.02
536
-
QwQ-32B
Model Category=Thinkin...
2026.02
484
-
Qwen3-8B-MegaScience
Model Category=Instruc...
2026.02
472
-
General-Reasoner-Qw3-14B
Model Category=Instruc...
2026.02
468
-
Qwen3-14B-MegaScience
Model Category=Instruc...
2026.02
464
-
Qwen3-4B thinking
Model Category=Thinkin...
2026.02
452
-
Qwen3-4B non-thinking
Model Category=Instruc...
2026.02
444
-
Qwen3-8B-VeriFree
Model Category=Instruc...
2026.02
436
-
General-Reasoner-4B
Model Category=Instruc...
2026.02
432
-
Qwen3-4B-MegaScience
Model Category=Instruc...
2026.02
412
-
Qwen3-4B-VeriFree
Model Category=Instruc...
2026.02
404
-
GPT-4o
Model Category=Instruc...
2026.02
348
-
Qwen3-4B-Base
Model Category=Base, M...
2026.02
92
-
Feedback
Search any
task
Search any
task