Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MATH 500 (Acc, Len, L-Acc)
Loading...
82.6
Accuracy
Effi. Reasoning
74.072
76.286
78.5
80.714
Jun 9, 2025
Accuracy
Solution Length
Length Accuracy
Updated 12d ago
Evaluation Results
Method
Method
Links
Accuracy
Solution Length
Length Accuracy
Effi. Reasoning
Base Model=DeepSeek-R1...
2025.06
82.6
2,395
69.5
Bingo-A
Base Model=DeepSeek-R1...
2025.06
82.2
894
77.6
Vanilla PPO
Base Model=DeepSeek-R1...
2025.06
81.4
2,771
66.2
DAST
Base Model=DeepSeek-R1...
2025.06
81.2
1,770
71.9
Bingo-E
Base Model=DeepSeek-R1...
2025.06
80.6
779
76.7
kimi-k1.5
Base Model=DeepSeek-R1...
2025.06
80.4
1,692
71.6
Demystifying
Base Model=DeepSeek-R1...
2025.06
80.2
1,411
73
O1-Pruner
Base Model=DeepSeek-R1...
2025.06
74.4
991
69.8
Feedback
Search any
task
Search any
task