Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Countdown on Countdown-4 (test)
Loading...
0.65
Reward
A-HPO
0.1508
0.2804
0.41
0.5396
May 28, 2026
Reward
Updated 5d ago
Evaluation Results
Method
Method
Links
Reward
A-HPO
Model=Qwen2.5-7B-Instr...
2026.05
0.65
A-HPO
Model=Qwen2.5-7B-Instr...
2026.05
0.63
GRPO
Model=Qwen2.5-7B-Instr...
2026.05
0.58
GRPO
Model=Qwen2.5-7B-Instr...
2026.05
0.58
A-HPO
Model=Llama3.1-3B-Inst...
2026.05
0.58
GRPO
Model=Llama3.1-3B-Inst...
2026.05
0.58
A-HPO
Model=Llama3.1-3B-Inst...
2026.05
0.54
GRPO
Model=Llama3.1-3B-Inst...
2026.05
0.4
A-HPO
Model=Qwen2.5-1.5B-Ins...
2026.05
0.23
GRPO
Model=Qwen2.5-1.5B-Ins...
2026.05
0.2
GRPO
Model=Qwen2.5-1.5B-Ins...
2026.05
0.19
A-HPO
Model=Qwen2.5-1.5B-Ins...
2026.05
0.17
Feedback
Search any
task
Search any
task