Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge Reasoning on GPQA Diamond
Loading...
64.1
Accuracy (avg@8)
Qwen3
35.604
43.002
50.4
57.798
Dec 18, 2025
Jan 6, 2026
Jan 25, 2026
Feb 14, 2026
Mar 5, 2026
Mar 24, 2026
Apr 13, 2026
Accuracy (avg@8)
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy (avg@8)
Qwen3
Params=32B
2026.04
64.1
I-DLM
Params=32B, Decoding=I...
2026.04
62.1
Qwen3
Params=8B
2026.04
58.9
I-DLM
Params=8B, Decoding=IS...
2026.04
55.6
DeepSeek-R1-Distill-Qwen-7B
Architecture=Dense, #...
2025.12
47.1
Sigma-MoE-Tiny
Architecture=MoE, # Ac...
2025.12
46.4
LLaDA-2.1-mini
Params=16B
2026.04
46
DeepSeek-R1-Distill-Llama-8B
Architecture=Dense, #...
2025.12
43.2
SDAR
Params=8B
2026.04
40.2
Qwen3-1.7B
Architecture=Dense, #...
2025.12
40.1
Phi-3.5-MoE
Architecture=MoE, # Ac...
2025.12
36.8
SDAR
Params=30B
2026.04
36.7
Feedback
Search any
task
Search any
task