Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WildVision

Benchmarks

Task NameDataset NameSOTA ResultTrend
Real-world UnderstandingWildVision
Win Rate80.6
25
Human PreferencesWildVision 0617
Score89.4
14
Reward ModelingWildVision-Battle
Accuracy89.83
13
Pointwise ScoringWildVision (pointwise)
Kendall's Tau0.949
9
Multi-modal preference alignmentWildVision
Winning Rate40.2
6
Multi-modal ChatWildVision 0617 (test)
General Score89.2
4
Multimodal Open-ended EvaluationWildVision (test)
Latxa Win %31.44
2
Showing 7 of 7 rows