| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code | CRUX | Accuracy @555.08 | 27 | |
| Code Reasoning | CRUX | Accuracy87.37 | 26 | |
| Code Reasoning | CRUX official (test) | Pass@1 Accuracy51.1 | 20 | |
| Code Generation | CRUX | Score (%)57.2 | 18 | |
| Nugget-based retrieval | CRUX Multi-News | Precision@1038 | 14 | |
| Nugget-based retrieval | CRUX DUC04 | Precision@1073.8 | 14 |