| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | CHAMP standard (test) | Accuracy68.5 | 36 | |
| Mathematical Reasoning | CHAMP | Accuracy68.2 | 32 | |
| Retrieval-Augmented Generation | CHAMP | Accuracy45.2 | 12 | |
| Skill retrieval | CHAMP | Recall@122.5 | 11 | |
| Skill retrieval | CHAMP | nDCG@135.4 | 11 |