Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on Countdown, GSM8K, MATH500, and SVAMP
Loading...
54.92
Accuracy
DiSE-flexible
52.1328
52.8564
53.58
54.3036
Mar 3, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
DiSE-flexible
Model=LLaDA-1.5-8B
2026.03
54.92
DiSE-flexible
Model=LLaDA-Instruct-8B
2026.03
53.79
Baseline (Max Len)
Model=LLaDA-1.5-8B
2026.03
53.52
Baseline
Model=LLaDA-1.5-8B
2026.03
53.37
Baseline (Max Len)
Model=LLaDA-Instruct-8B
2026.03
52.38
Baseline
Model=LLaDA-Instruct-8B
2026.03
52.24
Feedback
Search any
task
Search any
task