| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Incorrect Reasoning Path Detection | DeepScaleR | Accuracy64.24 | 46 | |
| Reasoning | DeepScaler | Accuracy57.3 | 30 | |
| Inference Efficiency | DeepScaleR-40k (1,024 mathematical problems) | Throughput (tokens/s)760.74 | 26 | |
| Mathematical Reasoning | DeepScaleR | Accuracy41.97 | 24 | |
| Mathematical Reasoning | DeepScaleR (test) | Greedy Success39.2 | 14 | |
| Mathematics | DeepScaler | Accuracy22.54 | 9 |