Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMVet

Benchmarks

Task NameDataset NameSOTA ResultTrend
General VQAMMVet
Score83.9
63
Multi-modal UnderstandingMMVet
Accuracy85.67
55
Multi-modal ReasoningMMVet (test)
Accuracy80.8
49
Multi-modal Vision-Language EvaluationMMVet
Accuracy46.8
38
Multi-modal Vision-Language UnderstandingMMVet
Score81.3
38
Self-evaluationMMVet
AUROC0.886
36
Multimodal UnderstandingMMVet turbo
Accuracy74
28
Multimodal UnderstandingMMVet v2 (0613)
Accuracy71.8
21
Multimodal ReasoningMMVet v1 (val)
Accuracy33.7
19
Multi-modal ReasoningMMVet
Score49.2
18
General Multimodal EvaluationMMVet turbo
Overall Score69.7
16
Visual Question AnsweringMMVet (test)
Score67.1
16
Visual Language Model EvaluationMMVet V2
MMVet V2 Score52.6
15
Multimodal UnderstandingMMVet
MMVet Score67.2
15
General Visual Question AnsweringMMVet 2024b
Score66.8
13
User Preference & FluencyMMVet
MMVet User Preference Score41.5
10
Multimodal ReasoningMMVet
Token Length3,296.8
9
Multimodal UnderstandingMMVet
Pass@174.94
9
Pointwise ScoringMMVet pointwise
Kendall's Tau0.974
9
Multimodal ComprehensionMMVet
Score58
8
Visual Language Model EvaluationMMVet
MMVet Score40.6
7
General Visual Question AnsweringMMVet turbo
Score76.2
7
Vision UnderstandingMMVet v1.0 (test)
Score36.87
6
Vision-language capabilityMMVet
Score81.2
5
Multimodal UnderstandingMMVet
Gain Score5.97
4
Showing 25 of 26 rows