| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| CHAIR | M3ID | CHAIR_s72.8 | 393 | 14d ago | |
| MMHal-Bench | Qwen2.5-VL + FINER-Tuning | MMHal Score4.7 | 306 | 5d ago | |
| AMBER | mPlug-Owl | CHAIR24.5 | 222 | 1d ago | |
| POPE | MIRROR(ours) | Accuracy94.42 | 217 | 6d ago | |
| HallusionBench | MIRROR(ours) | Accuracy82.02 | 153 | 2d ago | |
| Object HalBench | LLaVA-1.5-7B | CHAIR Score (s)54.7 | 78 | 8d ago | |
| CHAIR MSCOCO | VCD | CHAIR_S59.4 | 64 | 2mo ago | |
| MME Hallucination | DAC | Existence Score195 | 61 | 2d ago | |
| HaluEval | Multi-DPOP | Accuracy (ACC)100 | 51 | 14d ago | |
| HallBench | ToR-DAPO | Accuracy73.6 | 49 | 6d ago | |
| CHAIR MSCOCO 2014 (val) | Dola | CHAIRi26.2 | 45 | 14d ago | |
| A-HaluEval | Multi-DPOP | A-Accuracy91.5 | 40 | 22d ago | |
| MMHal | Score4.2 | 37 | 1mo ago | ||
| MSCOCO (val) | SparseVLM | CHAIR_i23.04 | 36 | 3mo ago | |
| POPE Adversarial v1.0 (test) | ResDec | Accuracy88.96 | 31 | 3mo ago | |
| POPE Popular v1.0 (test) | ResDec | Accuracy90.34 | 31 | 3mo ago | |
| POPE Random v1.0 (test) | ResDec | Accuracy91.17 | 31 | 3mo ago | |
| CHAIR MSCOCO 2014 | CHAIRs Score51.3 | 28 | 2mo ago | ||
| COCO | CS53 | 28 | 2mo ago | ||
| AMBER Generative Task | GPT-4V | Coverage67.1 | 26 | 8d ago | |
| MME hallucination (test) | VASparse | Existence Score180 | 24 | 3mo ago | |
| CRPE relation | InternVL2.5-26B | Accuracy79.1 | 23 | 3mo ago | |
| POPE Overall | InternVL3-8B | Accuracy91.1 | 21 | 19d ago | |
| MSCOCO | VCD | CS Score55.2 | 21 | 2mo ago | |
| MOH | baseline | HR^D69.5 | 21 | 3mo ago |