Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on LIMO
Loading...
17,474
Length
Full CoT SFT
3,628.48
7,222.99
10,817.5
14,412.01
Jan 31, 2026
Length
Accuracy
Pass@1
Updated 3d ago
Evaluation Results
Method
Method
Links
Length
Accuracy
Pass@1
Full CoT SFT
Decoding Strategy=Greedy
2026.01
17,474
0.304
-
Segment Selective SFT
Decoding Strategy=Greedy
2026.01
16,005
0.338
-
Full CoT SFT
Decoding Strategy=Samp...
2026.01
15,067
-
33
Segment Selective SFT
Decoding Strategy=Samp...
2026.01
14,669
-
33.5
LLaMA3.1-8B-Instruct
Decoding Strategy=Greedy
2026.01
9,691
0.24
-
LLaMA3.1-8B-Instruct
Decoding Strategy=Samp...
2026.01
4,161
-
23.5
Feedback
Search any
task
Search any
task