Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Arithmetic Reasoning on Countdown (Accuracy)
Loading...
33.6
Accuracy
LIFT2
-1.1672
7.8589
16.885
25.9111
Sep 25, 2025
Nov 3, 2025
Dec 13, 2025
Jan 22, 2026
Mar 2, 2026
Apr 11, 2026
May 21, 2026
Accuracy
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy
LIFT2
Backbone=Dream-7B, H=2
2026.05
33.6
Instruct Model + GIFT
Shots=5-shot, Training...
2025.09
27.5
Instruct Model + GIFT
Shots=5-shot, Training...
2025.09
26
LIFT3
Backbone=Dream-7B, H=3
2026.05
25.6
Vanilla
Backbone=Dream-7B
2026.05
25
GIFT
Backbone=Dream-7B
2026.05
23.4
Instruct Model + SFT
Shots=5-shot, Training...
2025.09
23.2
CART
Backbone=Dream-7B
2026.05
22.3
Instruct Model + SFT
Shots=5-shot, Training...
2025.09
21.7
Instruct
Backbone=Dream-7B
2026.05
21.1
GIFT
Training Dataset=s1K,...
2025.09
0.281
GIFT
Training Dataset=s1K-1...
2025.09
0.218
GIFT
Training Dataset=Tulu3...
2025.09
0.213
SFT
Training Dataset=s1K,...
2025.09
0.211
SFT
Training Dataset=s1K-1...
2025.09
0.207
GIFT
Training Dataset=openr...
2025.09
0.188
SFT
Training Dataset=Tulu3...
2025.09
0.182
SFT
Training Dataset=openr...
2025.09
0.173
LLaDA-8B-Instruct
Evaluation Protocol=0-...
2025.09
0.17
Feedback
Search any
task
Search any
task