LLaVA-Bench

Benchmarks

Task Name	Dataset Name	SOTA Result
Multimodal Understanding	LLaVA-Bench	Overall Score91.9	94
Multi-modal Understanding	LLaVA-Bench Wild	LLaVA^W Score91.2	86
Multimodal Conversation	LLaVA-Bench Wild	Score102	78
Multimodal Evaluation	LLaVA-bench in-the-wild	Score121.88	73
Visual Instruction Following	LLaVA-Bench Wild	Score102.3	71
Multimodal Evaluation	LLaVA-Bench	LLaVA-Bench Score79.2	48
Multimodal Conversation	LLaVA Bench	LLaVA Bench Score93.1	46
Complex Reasoning	LLaVA Bench (val)	Perplexity2.1875	44
Visual Question Answering	LLaVA-Bench-In-The-Wild	Score87.1	38
Multimodal Evaluation	LLaVA-Bench-Wild (LLaVA-W)	Overall Score97.3	38
Multimodal Instruction Following	LLaVA-Bench In-the-Wild	Score93.1	35
General Image Understanding	LLaVA-Bench In-the-Wild	Score67.7	22
Open-ended Generation	LLaVA-Bench	GPT-4 Score84.3	21
Multimodal Dialogue Evaluation	LLaVA-Bench Wild (test)	Score97.7	19
Multimodal Reasoning	LLaVA-Bench Wild	GPT-4 Score74.5	19
Visual Instruction Following Evaluation	LLaVA-Bench	Accuracy4.38	18
Visual Instruction Following	LLaVA-Bench	Overall Score79.1	15
Open-ended generation	LLaVA-Bench In-the-Wild	Score109.3	14
Utility Evaluation	LLaVA-Bench Coco	Score92.3	13
Multimodal Understanding	LLaVA-Bench	LLaVA-B Score65.5	12
Visual Question Answering	LLaVA Bench	VQA ASR68.31	12
General Multimodal Evaluation	LLaVA-Bench Wild	Relative Score92.8	12
Multimodal Performance Evaluation	LLaVA-Bench In-the-Wild	General Score78.9	12
Open-ended generation	LLaVA-Bench COCO	Reference Score85.76	11
Helpfulness Evaluation	LLaVA-Bench	Conversation Score93.1	11

Showing 25 of 52 rows