Share your thoughts, 1 month free Claude Pro on usSee more

Scientific Reasoning on Aggregate GPQA, HLE, MMLU-Pro

44.6Average Score

Dr.SCI-4B-think

Updated 1mo ago

Evaluation Results

Method	Links
Dr.SCI-4B-think 2026.02		44.6
o1-mini 2026.02		43.4
R1-0528-Qwen3-8B 2026.02		41.8
R1-Distill-Qwen-32B 2026.02		41
Dr.SCI-4B-instruct 2026.02		40.2
GPT-4o 2026.02		39
Qwen3-14B-MegaScience 2026.02		39
Qwen3-4B thinking 2026.02		38.9
General-Reasoner-Qw3-14B 2026.02		38.9
QwQ-32B 2026.02		37.4
Qwen3-8B-MegaScience 2026.02		35.5
Qwen3-8B-VeriFree 2026.02		34.6
Qwen3-4B-VeriFree 2026.02		32.3
General-Reasoner-4B 2026.02		31.6
Qwen3-4B-MegaScience 2026.02		29.3
Qwen3-4B non-thinking 2026.02		29.2
Qwen3-4B-Base 2026.02		24.5