Share your thoughts, 1 month free Claude Pro on usSee more

LLaVA

Benchmarks

Task Name	Dataset Name	SOTA Result
Multimodal Understanding	LLaVA Evaluation Suite 1.5	Average Score100	95
Text Membership Inference Attack	LLaVA LLM Pre-training	AUC0.688	88
Visual Question Answering	LLaVA-W	ROUGE-L49.1	56
Multi-modal Understanding	LLaVA Multi-modal Evaluation Suite (GQA, MMB, MME, POPE, SQA, VQAv2, TextVQA, MMMU, SEED-I) v1.6 (test)	Average Score100	53
Text Membership Inference Attack	LLaVA VLLM Tuning	AUC0.993	44
Multimodal Understanding	LLaVA Evaluation Suite GQA, MMB, MMB-CN, MME, POPE, SQA, VQAV2, VQAText, VizWiz	GQA64.2	41
Vision-Language Understanding and Reasoning	LLaVA Multimodal Evaluation Suite (GQA, MMBench, MME, POPE, ScienceQA, VQAv2, TextVQA, SEED-Bench, MM-Vet, VizWiz) 1.5 (test/val)	GQA62	41
Jailbreak Defense	LLaVA v1.5	ASR3.18	36
Toxicity Defense	LLaVA v1.5	Toxicity Score22.35	36
General Vision-Language Understanding	LLaVA-OneVision	Score66.82	36
Multimodal Evaluation	LLaVA Evaluation Suite 7B v1.5 (test)	GQA61.9	34
Visual Instruction Following	LLaVA-W	Score102	28
Multimodal Understanding and Question Answering	LLaVA 7B Evaluation Suite (GQA, MMBench, MMBench-CN, MME, POPE, ScienceQA, VQAv2, TextVQA, SEED-Bench, VizWiz) 1.5	GQA Accuracy61.9	22
Multimodal Large Language Model Inference Efficiency	LLaVA 13B 1.5 (test)	TTFT (ms)60.2	21
Hallucination detection	llava	AUC ROC96.5	19
Multimodal Understanding	LLaVA High-IC tasks (MMB, POPE, MME, SEED, GQA) 1.5-7B	Performance Ratio94.7	18
Multi-modal Understanding and Reasoning	LLaVA-QA90 (test)	Accuracy6.69	18
Multi-modal Instruction Following	LLaVA-Wild	Average Score69.8	17
Multimodal Understanding	Aggregate LLaVA 1.5 Suite	Relative Average Score98.7	17
Multimodal Visual Question Answering	LLaVA Evaluation Suite (GQA, MME, POPE, SQA-Img, VizWiz, VQAv2, MMB-En) 1.5	GQA61.9	16
Large Vision-Language Model evaluation	LLaVA Evaluation Suite (MMBench, MME, MM-Vet, ScienceQA) 1.5 (test val)	MMBench68.5	16
Overall Multimodal Performance	LLaVA 665K Evaluation Suite	Relative Score100.3	15
Jailbreak Detection	LLaVA Vicuna-7B v1.6	Accuracy92	13
Image Captioning	MC-LLaVA	Caption Recall (Single)83.6	11
Vision-Language	LLaVa 1.5	GQA Score63.01	11

Showing 25 of 68 rows