Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLaVA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal UnderstandingLLaVA Evaluation Suite 1.5
Average Score100
95
Text Membership Inference AttackLLaVA LLM Pre-training
AUC0.688
88
Visual Question AnsweringLLaVA-W
ROUGE-L49.1
56
Multi-modal UnderstandingLLaVA Multi-modal Evaluation Suite (GQA, MMB, MME, POPE, SQA, VQAv2, TextVQA, MMMU, SEED-I) v1.6 (test)
Average Score100
53
Text Membership Inference AttackLLaVA VLLM Tuning
AUC0.993
44
Multimodal UnderstandingLLaVA Evaluation Suite GQA, MMB, MMB-CN, MME, POPE, SQA, VQAV2, VQAText, VizWiz
GQA64.2
41
Vision-Language Understanding and ReasoningLLaVA Multimodal Evaluation Suite (GQA, MMBench, MME, POPE, ScienceQA, VQAv2, TextVQA, SEED-Bench, MM-Vet, VizWiz) 1.5 (test/val)
GQA62
41
Jailbreak DefenseLLaVA v1.5
ASR3.18
36
Toxicity DefenseLLaVA v1.5
Toxicity Score22.35
36
General Vision-Language UnderstandingLLaVA-OneVision
Score66.82
36
Multimodal EvaluationLLaVA Evaluation Suite 7B v1.5 (test)
GQA61.9
34
Visual Instruction FollowingLLaVA-W
Score102
28
Multimodal Understanding and Question AnsweringLLaVA 7B Evaluation Suite (GQA, MMBench, MMBench-CN, MME, POPE, ScienceQA, VQAv2, TextVQA, SEED-Bench, VizWiz) 1.5
GQA Accuracy61.9
22
Multimodal Large Language Model Inference EfficiencyLLaVA 13B 1.5 (test)
TTFT (ms)60.2
21
Hallucination detectionllava
AUC ROC96.5
19
Multimodal UnderstandingLLaVA High-IC tasks (MMB, POPE, MME, SEED, GQA) 1.5-7B
Performance Ratio94.7
18
Multi-modal Understanding and ReasoningLLaVA-QA90 (test)
Accuracy6.69
18
Multi-modal Instruction FollowingLLaVA-Wild
Average Score69.8
17
Multimodal UnderstandingAggregate LLaVA 1.5 Suite
Relative Average Score98.7
17
Multimodal Visual Question AnsweringLLaVA Evaluation Suite (GQA, MME, POPE, SQA-Img, VizWiz, VQAv2, MMB-En) 1.5
GQA61.9
16
Large Vision-Language Model evaluationLLaVA Evaluation Suite (MMBench, MME, MM-Vet, ScienceQA) 1.5 (test val)
MMBench68.5
16
Overall Multimodal PerformanceLLaVA 665K Evaluation Suite
Relative Score100.3
15
Jailbreak DetectionLLaVA Vicuna-7B v1.6
Accuracy92
13
Image CaptioningMC-LLaVA
Caption Recall (Single)83.6
11
Vision-LanguageLLaVa 1.5
GQA Score63.01
11
Showing 25 of 68 rows