Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Reasoning on GPQA Diamond (Score)
Loading...
69.7
Score
DeepSeek-R1
26.644
37.822
49
60.178
Jan 23, 2026
Score
Updated 3mo ago
Evaluation Results
Method
Method
Links
Score
DeepSeek-R1
Model Category=Closed-...
2026.01
69.7
Claude3.7-Sonnet
Model Category=Closed-...
2026.01
67.7
LoGos(32B)
Model Category=Open-so...
2026.01
63.6
o1-mini
Model Category=Closed-...
2026.01
61.1
DeepSeek-R1-Distill-Qwen-32B
Model Category=Open-so...
2026.01
56.1
Qwen2.5-32B-Instruct
Model Category=Open-so...
2026.01
46
DeepSeek-R1-Distill-Qwen-7B
Model Category=Open-so...
2026.01
41.4
Qwen2.5-7B-Instruct
Model Category=Open-so...
2026.01
39.9
LoGos(7B)
Model Category=Open-so...
2026.01
37.9
Qwen2.5-32B-Base
Model Category=Open-so...
2026.01
35.9
Qwen2.5-7B-Base
Model Category=Open-so...
2026.01
28.3
Feedback
Search any
task
Search any
task