| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| POPE Adversarial | LogicCheckGPT | Accuracy90 | 288 | 3d ago | |
| POPE (Random) | F1 Score93.02 | 285 | 3d ago | ||
| POPE Popular | LogicCheckGPT | F1 Score91.4 | 273 | 3d ago | |
| MSCOCO 500 images 2014 (val) | Consistency Score (CS)60.6 | 50 | 1mo ago | ||
| COCO captions 2014 (val) | CHAIR (scene)12.3 | 35 | 1mo ago | ||
| MSCOCO POPE (test) | HGAI | Accuracy (Random)90.7 | 32 | 1mo ago | |
| A-OKVQA POPE (test) | OSGA | Accuracy (Random)90.13 | 29 | 1mo ago | |
| COCO 512-token budget (test) | VCD | Consistency Score62.9 | 24 | 1mo ago | |
| COCO 64-token budget (test) | VCD | CS0.328 | 24 | 1mo ago | |
| POPE Adversarial v1.0 | ONLY + VDC | Accuracy84.4 | 24 | 1mo ago | |
| POPE Popular v1.0 | ONLY + VDC | Accuracy88.03 | 24 | 1mo ago | |
| POPE v1.0 (Random) | ONLY + VDC | Accuracy90.07 | 24 | 1mo ago | |
| MME | ICT | E Score195 | 22 | 1mo ago | |
| POPE average across COCO, A-OKVQA, GQA | AFTER | ACC85.7 | 22 | 1mo ago | |
| MSCOCO (test) | Nullu | Accuracy79.52 | 21 | 1mo ago | |
| POPE (test) | InfiMM-HD | POPE Score87.9 | 12 | 1mo ago | |
| POPE (test) | POPE Score86.3 | 3 | 1mo ago | ||
| Hallucination in Captioning | CHi3.2 | 3 | 1mo ago |