| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Understanding | Multiple Aggregate | Average Score69.8 | 18 | |
| Generalist Multi-task Evaluation | Multiple (ImageNet-1K, COCO) | Mean Delta-11.8 | 13 | |
| Factuality Detection | Multiple TriviaQA, HotpotQA, CSQA | Average AUROC72.9 | 4 | |
| Code Generation | Multiple | Score78.51 | 3 | |
| Controllable Language Generation | Multiple Distributional Constraint | Ctrl0.95 | 3 |