| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Error Detection | In-distribution (test) | AUC0.8916 | 40 | |
| Mathematical Reasoning | In-Distribution Avg | Average Score45.6 | 29 | |
| Debiasing Effectiveness | In-Distribution (ID) | Mean Effectiveness Score (ID)10.2 | 16 | |
| Reasoning step reduction | In-Distribution 5K corpus (test) | Savings Rate47.5 | 9 | |
| Text-to-Speech | In-distribution ID (test) | MOS3.87 | 5 | |
| Metasurface inverse design | In-Distribution (test) | SG74 | 2 |