Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on HLE decontaminated (Accuracy)
Loading...
8.4
Accuracy
DeepMath-103K
2.888
4.319
5.75
7.181
May 26, 2026
Accuracy
Updated 7d ago
Evaluation Results
Method
Method
Links
Accuracy
DeepMath-103K
Backbone=Qwen3-1.7B, S...
2026.05
8.4
DAPO++
Backbone=Qwen3-1.7B, S...
2026.05
6.8
DeepScaleR
Backbone=Qwen3-1.7B, S...
2026.05
6.3
Qwen3-1.7B-Base
Backbone=Qwen3-1.7B, S...
2026.05
5.9
DAPO-Math-17k
Backbone=Qwen3-1.7B, S...
2026.05
5.9
Qwen3-8B-Base
Backbone=Qwen3-8B, Sam...
2026.05
5.7
Skywork-OR1-RL-Data
Backbone=Qwen3-1.7B, S...
2026.05
5.1
OpenR1-Math-220k
Backbone=Qwen3-1.7B, S...
2026.05
4.7
DAPO++
Backbone=Qwen3-8B, Sam...
2026.05
4.7
DeepScaleR
Backbone=Qwen3-8B, Sam...
2026.05
4.5
DeepMath-103K
Backbone=Qwen3-8B, Sam...
2026.05
4.5
DAPO-Math-17k
Backbone=Qwen3-8B, Sam...
2026.05
4.1
OpenR1-Math-220k
Backbone=Qwen3-8B, Sam...
2026.05
3.9
Skywork-OR1-RL-Data
Backbone=Qwen3-8B, Sam...
2026.05
3.1
Feedback
Search any
task
Search any
task