| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Tool Retrieval | Mixed | NDCG@100.59 | 44 | |
| General Problem Solving | Mixed (AIME, GPQA, HLE, HotpotQA, ALFWorld) | Average Score57.74 | 24 | |
| Grammatical Understanding | MIXED (test) | Task 1 Accuracy95 | 13 | |
| Machine Translation | MIXED #3000 English to Luxembourgish (test) | CometScore0.26 | 13 | |
| Named Entity Recognition | Mixed | F1 Score45.55 | 12 | |
| Spoofing Method Identification | Mixed In Domain | Accuracy96.44 | 11 | |
| Authenticity Classification | Mixed In Domain | Accuracy96.16 | 11 | |
| Speech Anti-Spoofing | Mixed In-Domain | EER0.0306 | 11 | |
| Spoofing Region Localization | Mixed In Domain | Seg-F191.33 | 9 | |
| 4D Mesh Compression | Mixed | Time (ms)6.89 | 5 | |
| Text Classification | Mixed BERT-base (test) | Accuracy83.3 | 5 | |
| Reflection symmetry detection | mixed SDRW LDRS NYU (test) | F1 Score71.4 | 2 |