| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Real-world Understanding | WildVision | Win Rate80.6 | 25 | |
| Human Preferences | WildVision 0617 | Score89.4 | 14 | |
| Reward Modeling | WildVision-Battle | Accuracy89.83 | 13 | |
| Pointwise Scoring | WildVision (pointwise) | Kendall's Tau0.949 | 9 | |
| Multi-modal preference alignment | WildVision | Winning Rate40.2 | 6 | |
| Multi-modal Chat | WildVision 0617 (test) | General Score89.2 | 4 | |
| Multimodal Open-ended Evaluation | WildVision (test) | Latxa Win %31.44 | 2 |