Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepScaleR

Benchmarks

Task NameDataset NameSOTA ResultTrend
Incorrect Reasoning Path DetectionDeepScaleR
Accuracy64.24
46
Inference EfficiencyDeepScaleR-40k (1,024 mathematical problems)
Throughput (tokens/s)760.74
26
Mathematical ReasoningDeepScaleR (test)
Greedy Success39.2
14
Showing 3 of 3 rows