| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| POPE (Random) | F1 Score93.02 | 200 | 2d ago | ||
| POPE Adversarial | LogicCheckGPT | Accuracy90 | 196 | 2d ago | |
| POPE Popular | LogicCheckGPT | F1 Score91.4 | 188 | 2d ago | |
| MSCOCO 500 images 2014 (val) | Consistency Score (CS)60.6 | 50 | 3d ago | ||
| COCO 512-token budget (test) | VCD | Consistency Score62.9 | 24 | 3d ago | |
| COCO 64-token budget (test) | VCD | CS0.328 | 24 | 3d ago | |
| POPE Adversarial v1.0 | ONLY + VDC | Accuracy84.4 | 24 | 3d ago | |
| POPE Popular v1.0 | ONLY + VDC | Accuracy88.03 | 24 | 3d ago | |
| POPE v1.0 (Random) | ONLY + VDC | Accuracy90.07 | 24 | 3d ago | |
| MME | ICT | E Score195 | 22 | 3d ago | |
| POPE average across COCO, A-OKVQA, GQA | AFTER | ACC85.7 | 22 | 3d ago | |
| MSCOCO (test) | Nullu | Accuracy79.52 | 21 | 3d ago | |
| POPE (test) | InfiMM-HD | POPE Score87.9 | 12 | 3d ago | |
| MSCOCO POPE (test) | SDCD | Accuracy (Random)85.9 | 11 | 3d ago | |
| A-OKVQA POPE (test) | OSGA | Accuracy (Random)90.13 | 8 | 3d ago | |
| POPE (test) | POPE Score86.3 | 3 | 3d ago | ||
| Hallucination in Captioning | CHi3.2 | 3 | 3d ago |