Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Graduate-level Reasoning on GPQA main (Pass@10)
Loading...
76.7
Pass@10
Multi-chain
71.3544
72.7422
74.13
75.5178
Jan 2, 2026
Pass@10
Updated 3mo ago
Evaluation Results
Method
Method
Links
Pass@10
Multi-chain
Model=Qwen2.5-32B-Inst...
2026.01
76.7
Entropy-Tree
Model=Qwen2.5-32B-Inst...
2026.01
75.93
Entropy-Tree
Model=Qwen2.5-14B-Inst...
2026.01
73.53
Multi-chain
Model=Qwen2.5-14B-Inst...
2026.01
72.11
Entropy-Tree
Model=Qwen2.5-7B-Instruct
2026.01
72.07
Multi-chain
Model=Qwen2.5-7B-Instruct
2026.01
71.56
Feedback
Search any
task
Search any
task