Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Graduate-level Science Reasoning on GPQA
Loading...
69.8
Accuracy
Apriel-Reasoner (Ours)
64.08
65.565
67.05
68.535
Apr 2, 2026
Accuracy
Average Output Length (Tokens)
Updated 16d ago
Evaluation Results
Method
Method
Links
Accuracy
Average Output Length (Tokens)
Apriel-Reasoner (Ours)
Size=15B
2026.04
69.8
5,800
Apriel-Base + RLVR w/ LP
Size=15B, Length Penal...
2026.04
68.9
4,500
Apriel-Base
Size=15B
2026.04
68.8
10,500
Nemotron-Cascade
Size=14B
2026.04
68.4
10,600
Phi-4-reasoning
Size=14B
2026.04
64.8
3,500
Qwen3
Size=14B
2026.04
64.3
6,700
Feedback
Search any
task
Search any
task