Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Complex Reasoning on SCoRE (test)
Loading...
16.26
Accuracy
SelfBudgeter
6.3072
8.8911
11.475
14.0589
May 16, 2025
Accuracy
Length
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Length
SelfBudgeter
backbone=1.5B
2025.05
16.26
4,491.35
L1
setting=Max(3600)
2025.05
13.69
5,145.91
E1-Math
backbone=1.5B, budget=...
2025.05
12.23
3,327.37
DeepSeek-R1-Distill-Qwen
backbone=1.5B
2025.05
10.14
11,695.94
E1-Math
backbone=1.5B, budget=...
2025.05
6.69
1,272.04
Feedback
Search any
task
Search any
task