Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Graduate-level Reasoning on GPQA diamond (Pass@10)
Loading...
78.29
Pass@10
Multi-chain
70.5316
72.5458
74.56
76.5742
Jan 2, 2026
Pass@10
Updated 3mo ago
Evaluation Results
Method
Method
Links
Pass@10
Multi-chain
Model=Qwen2.5-32B-Inst...
2026.01
78.29
Entropy-Tree
Model=Qwen2.5-32B-Inst...
2026.01
77.98
Entropy-Tree
Model=Qwen2.5-7B-Instruct
2026.01
74.81
Entropy-Tree
Model=Qwen2.5-14B-Inst...
2026.01
74.1
Multi-chain
Model=Qwen2.5-14B-Inst...
2026.01
73.25
Multi-chain
Model=Qwen2.5-7B-Instruct
2026.01
70.83
Feedback
Search any
task
Search any
task