Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Graduate-level Science Reasoning on GPQA (test)
Loading...
34.3
Accuracy
E1-Math
26.0216
28.1708
30.32
32.4692
May 16, 2025
Accuracy
Output Length
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Output Length
E1-Math
backbone=1.5B, budget=...
2025.05
34.3
2,758.77
DeepSeek-R1-Distill-Qwen
backbone=1.5B
2025.05
33.04
11,780.87
L1
setting=Max(3600)
2025.05
31.92
3,892.47
SelfBudgeter
backbone=1.5B
2025.05
30.65
3,326.83
E1-Math
backbone=1.5B, budget=...
2025.05
26.34
1,278.19
Feedback
Search any
task
Search any
task