Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLaVA-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-modal UnderstandingLLaVA-Bench Wild
LLaVA^W Score91.2
86
Multimodal UnderstandingLLaVA-Bench
Overall Score91.9
72
Multimodal ConversationLLaVA-Bench Wild
Score102
65
Visual Instruction FollowingLLaVA-Bench Wild
Score102.3
60
Multimodal EvaluationLLaVA-bench in-the-wild
Score121.88
56
Complex ReasoningLLaVA Bench (val)
Perplexity2.1875
44
Visual Question AnsweringLLaVA-Bench-In-The-Wild
Score87.1
38
Multimodal EvaluationLLaVA-Bench
LLaVA-Bench Score79.2
38
Multimodal EvaluationLLaVA-Bench-Wild (LLaVA-W)
Overall Score97.3
24
Multimodal Instruction FollowingLLaVA-Bench In-the-Wild
Score93.1
23
Multimodal ConversationLLaVA Bench
LLaVA Bench Score93.1
21
Multimodal Dialogue EvaluationLLaVA-Bench Wild (test)
Score97.7
19
Multimodal ReasoningLLaVA-Bench Wild
GPT-4 Score74.5
19
Visual Instruction Following EvaluationLLaVA-Bench
Accuracy4.38
18
Visual Instruction FollowingLLaVA-Bench
Overall Score79.1
15
Utility EvaluationLLaVA-Bench Coco
Score92.3
13
Visual Question AnsweringLLaVA Bench
VQA ASR68.31
12
General Multimodal EvaluationLLaVA-Bench Wild
Relative Score92.8
12
Multimodal Performance EvaluationLLaVA-Bench In-the-Wild
General Score78.9
12
Open-ended generationLLaVA-Bench In-the-Wild
Ref Score62.46
11
Open-ended generationLLaVA-Bench COCO
Reference Score85.76
11
Helpfulness EvaluationLLaVA-Bench
Conversation Score93.1
11
Visual Question AnsweringLLaVA-Bench LLaVAW
Score89.1
10
Large Multi-modal Model EvaluationLLaVA-Bench Tool Use (test)
Grounding0.893
8
Multimodal Tool UseLLaVA-Bench Tool Use
Grounding89.3
8
Showing 25 of 43 rows