Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Math reasoning on MATH500 (Coverage and Success Rate)
Loading...
86
Coverage
TOFU
70.816
74.758
78.7
82.642
Apr 30, 2026
Coverage
Average Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Coverage
Average Success Rate
TOFU
Backbone=QWEN-2.5-MATH-7B
2026.04
86
53.8
CE
Backbone=QWEN-2.5-MATH-7B
2026.04
84.8
54.2
GEM
Backbone=QWEN-2.5-MATH-7B
2026.04
83.2
59
TOFU
Backbone=QWEN-2.5-MATH...
2026.04
80.6
50.3
CE
Backbone=QWEN-2.5-MATH...
2026.04
78.4
53.9
GEM
Backbone=QWEN-2.5-MATH...
2026.04
78.4
53.4
TOFU
Backbone=DEEPSEEK-MATH-7B
2026.04
72.6
33
GEM
Backbone=DEEPSEEK-MATH-7B
2026.04
72.2
36.4
CE
Backbone=DEEPSEEK-MATH-7B
2026.04
71.4
32.7
Feedback
Search any
task
Search any
task