| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Incorrect Reasoning Path Detection | DeepScaleR | Accuracy64.24 | 46 | |
| Inference Efficiency | DeepScaleR-40k (1,024 mathematical problems) | Throughput (tokens/s)760.74 | 26 | |
| Mathematical Reasoning | DeepScaleR (test) | Greedy Success39.2 | 14 |