Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepScaleR

Benchmarks

Task NameDataset NameSOTA ResultTrend
Incorrect Reasoning Path DetectionDeepScaleR
Accuracy64.24
46
ReasoningDeepScaler
Accuracy57.3
30
Inference EfficiencyDeepScaleR-40k (1,024 mathematical problems)
Throughput (tokens/s)760.74
26
Mathematical ReasoningDeepScaleR
Accuracy41.97
24
Mathematical ReasoningDeepScaleR (test)
Greedy Success39.2
14
MathematicsDeepScaler
Accuracy22.54
9
Showing 6 of 6 rows