| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text Membership Inference Attack | LLaVA LLM Pre-training | AUC0.688 | 88 | |
| Text Membership Inference Attack | LLaVA VLLM Tuning | AUC0.993 | 44 | |
| Multimodal Understanding | LLaVA Evaluation Suite GQA, MMB, MMB-CN, MME, POPE, SQA, VQAV2, VQAText, VizWiz | GQA64.2 | 41 | |
| Multimodal Understanding | LLaVA Evaluation Suite 1.5 | GQA63.2 | 32 | |
| Visual Instruction Following | LLaVA-W | Score102 | 28 | |
| Multimodal Large Language Model Inference Efficiency | LLaVA 13B 1.5 (test) | TTFT (ms)60.2 | 21 | |
| Hallucination detection | llava | AUC ROC96.5 | 19 | |
| Vision-Language Understanding and Reasoning | LLaVA Multimodal Evaluation Suite (GQA, MMBench, MME, POPE, ScienceQA, VQAv2, TextVQA, SEED-Bench, MM-Vet, VizWiz) 1.5 (test/val) | GQA0.619 | 16 | |
| Large Vision-Language Model evaluation | LLaVA Evaluation Suite (MMBench, MME, MM-Vet, ScienceQA) 1.5 (test val) | MMBench68.5 | 16 | |
| Jailbreak Detection | LLaVA Vicuna-7B v1.6 | Accuracy92 | 13 | |
| Image Captioning | MC-LLaVA | Caption Recall (Single)83.6 | 11 | |
| Vision-Language | LLaVa 1.5 | GQA Score63.01 | 11 | |
| Vision Understanding | LLaVA-W | Score63 | 10 | |
| Large Vision-Language Model Evaluation | LLAVA (bench) | Score77.8 | 10 | |
| Adversarial Attack | LLaVA | CLIP Similarity (RN-50)0.2427 | 9 | |
| Pointwise Scoring | LLaVA-W pointwise | Kendall's Tau0.949 | 9 | |
| Multimodal Instruction Following | LLaVA Wilder | Score92 | 9 | |
| Inference Efficiency | LLaVA 7B 1.5 | Latency (ms)802.65 | 8 | |
| Vision Understanding | LLaVA-Wild | LLaVA-Wild Accuracy74.2 | 8 | |
| Communication Cost Analysis | LLaVA 1.5 | Total Latency (s)97.268 | 7 | |
| Vision-Language Evaluation | LLaVA-Wilder | Accuracy83.7 | 7 | |
| Open-ended Visual Question Answering | LLaVA Eval v1 (test) | Conversation Score77.67 | 7 | |
| Black-Box Adversarial Attack | LLaVA 1.5 | KMR (a)0.96 | 6 | |
| Inference Efficiency | LLaVA-NeXT Inference | Inference Time (s)7.998 | 6 | |
| Knowledge Transfer | LLaVA Evaluation Suite Flickr30k 1.5 | VQAv2 Accuracy78.52 | 6 |