Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OpenR1-Math

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningOpenR1-Math
Avg@862.9
14
Language ModelingOpenR1-Math seed-1 representative
Perplexity (PPL)2.86
9
Distillation Data DetectionOpenR1-Math 220k (balanced evaluation set)
AUC0.665
8
Mathematical ReasoningOpenR1-Math-220k unseen
Accuracy46
6
Showing 4 of 4 rows