| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Distributional Alignment | in-domain (test) | JSD0.11 | 56 | |
| Arithmetic Reasoning | In-domain (test) | Accuracy53.4 | 50 | |
| Visual Question Answering | In-Domain Multiple-Choice | OCR-VQA Accuracy92.9 | 15 | |
| Image Retrieval | In-domain (ID) Split | mAP0.571 | 14 | |
| Interactive Segmentation | In-domain (test) | IoU86.37 | 14 | |
| Instruction Following | In-domain | Win Rate14 | 11 | |
| Tool Calling | In-domain (unseen) | Exact Match (EM)57.24 | 10 | |
| Tool Calling | In-domain seen | EM74.05 | 10 | |
| Machine Translation | In-Domain (ID) (val) | BLEU40.72 | 10 | |
| Tool Identification | In-domain unseen | EM75.86 | 9 | |
| Tool Identification | In-domain (seen) | Exact Match (EM)85.95 | 9 | |
| Singing Accompaniment Generation | In-domain | CE Score7.3543 | 8 | |
| LLM Routing | In-domain | Cost (Cost-first)2.3 | 7 | |
| Ambisonics encoding | In-Domain | Coherence54 | 7 | |
| AI-Generated Text Detection | In-domain (test) | OA100 | 4 | |
| Shading Estimation | In-domain | MSE0.0265 | 3 | |
| Albedo Estimation | In-domain | MSE0.0051 | 3 | |
| Depth Estimation | In-domain | REL0.1072 | 3 |