Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLaVA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal UnderstandingLLaVA Evaluation Suite 1.5
Average Score100
95
Text Membership Inference AttackLLaVA LLM Pre-training
AUC0.688
88
Visual Question AnsweringLLaVA-W
ROUGE-L49.1
56
Multi-modal UnderstandingLLaVA Multi-modal Evaluation Suite (GQA, MMB, MME, POPE, SQA, VQAv2, TextVQA, MMMU, SEED-I) v1.6 (test)
Average Score100
53
Text Membership Inference AttackLLaVA VLLM Tuning
AUC0.993
44
Multimodal UnderstandingLLaVA Evaluation Suite GQA, MMB, MMB-CN, MME, POPE, SQA, VQAV2, VQAText, VizWiz
GQA64.2
41
Jailbreak DefenseLLaVA v1.5
ASR3.18
36
Toxicity DefenseLLaVA v1.5
Toxicity Score22.35
36
General Vision-Language UnderstandingLLaVA-OneVision
Score66.82
36
Visual Instruction FollowingLLaVA-W
Score102
28
Multimodal Large Language Model Inference EfficiencyLLaVA 13B 1.5 (test)
TTFT (ms)60.2
21
Hallucination detectionllava
AUC ROC96.5
19
Multimodal UnderstandingLLaVA High-IC tasks (MMB, POPE, MME, SEED, GQA) 1.5-7B
Performance Ratio94.7
18
Multi-modal Understanding and ReasoningLLaVA-QA90 (test)
Accuracy6.69
18
Multi-modal Instruction FollowingLLaVA-Wild
Average Score69.8
17
Multimodal UnderstandingAggregate LLaVA 1.5 Suite
Relative Average Score98.7
17
Vision-Language Understanding and ReasoningLLaVA Multimodal Evaluation Suite (GQA, MMBench, MME, POPE, ScienceQA, VQAv2, TextVQA, SEED-Bench, MM-Vet, VizWiz) 1.5 (test/val)
GQA0.619
16
Large Vision-Language Model evaluationLLaVA Evaluation Suite (MMBench, MME, MM-Vet, ScienceQA) 1.5 (test val)
MMBench68.5
16
Jailbreak DetectionLLaVA Vicuna-7B v1.6
Accuracy92
13
Image CaptioningMC-LLaVA
Caption Recall (Single)83.6
11
Vision-LanguageLLaVa 1.5
GQA Score63.01
11
Jailbreak AttackLLaVA 1.5
ASR100
10
Vision UnderstandingLLaVA-W
Score63
10
Large Vision-Language Model EvaluationLLAVA (bench)
Score77.8
10
Adversarial AttackLLaVA
CLIP Similarity (RN-50)0.2427
9
Showing 25 of 48 rows