Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLaVA-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-modal UnderstandingLLaVA-Bench Wild
LLaVA^W Score91.2
52
Multimodal ConversationLLaVA-Bench Wild
Score102
52
Visual Question AnsweringLLaVA-Bench-In-The-Wild
Score87.1
38
Multimodal EvaluationLLaVA-Bench
LLaVA-Bench Score79.2
38
Multimodal EvaluationLLaVA-bench in-the-wild
Score97.3
36
Visual Instruction FollowingLLaVA-Bench Wild
Score81.8
35
Multimodal EvaluationLLaVA-Bench-Wild (LLaVA-W)
Overall Score97.3
24
Multimodal UnderstandingLLaVA-Bench
Overall Score91.9
23
Multimodal Instruction FollowingLLaVA-Bench In-the-Wild
Score93.1
23
Multimodal ConversationLLaVA Bench
LLaVA Bench Score93.1
21
Multimodal Dialogue EvaluationLLaVA-Bench Wild (test)
Score97.7
19
Multimodal ReasoningLLaVA-Bench Wild
GPT-4 Score74.5
19
Visual Instruction Following EvaluationLLaVA-Bench
Accuracy4.38
18
Utility EvaluationLLaVA-Bench Coco
Score92.3
13
Visual Question AnsweringLLaVA Bench
VQA ASR68.31
12
General Multimodal EvaluationLLaVA-Bench Wild
Relative Score92.8
12
Multimodal Performance EvaluationLLaVA-Bench In-the-Wild
General Score78.9
12
Helpfulness EvaluationLLaVA-Bench
Conversation Score93.1
11
Visual Question AnsweringLLaVA-Bench LLaVAW
Score89.1
10
Large Multi-modal Model EvaluationLLaVA-Bench Tool Use (test)
Grounding0.893
8
Multimodal Tool UseLLaVA-Bench Tool Use
Grounding89.3
8
Visual Instruction FollowingLLaVA-Bench
Conversation Score93.9
8
Open-ended Visual ChatLLaVA-Bench In-the-Wild (full)
Reasoning Score90.1
8
General Multi-modal Assistant TaskLLaVA-Bench (LLaVA-B)
Score77.5
7
Open-ended Visual Question AnsweringLLaVA Bench v1 (test)
Relevance37.18
7
Showing 25 of 35 rows