| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Hallucination Detection | FOIL | Accuracy (4 Refs)98.4 | 32 | |
| Image Captioning Hallucination Detection | FOIL (test) | Accuracy95.4 | 28 | |
| Image-Text Matching | Foil | AURC0.245 | 23 | |
| Hallucination Detection | FOIL | Accuracy92.6 | 18 | |
| Image Caption Evaluation | FOIL (4-ref) | Accuracy94.64 | 15 | |
| Image Caption Evaluation | FOIL 1-ref | Accuracy90.94 | 15 | |
| Visual Reasoning | FOIL 30K captions derived from COCO | Object Accuracy84.3 | 10 | |
| Object Hallucination Detection | FOIL (test) | Accuracy98.4 | 9 | |
| Hallucination Detection | FOIL 4-ref | Accuracy98.4 | 6 | |
| Hallucination Detection | FOIL (1-ref) | Accuracy98.2 | 6 | |
| Image Captioning Evaluation | FOIL | Accuracy (1-ref)96.7 | 6 | |
| Foil Detection | FOIL-nocaps (Out of Domain) | FDR80.2 | 6 | |
| Foil Detection | FOIL nocaps (In Domain) | FDR78.9 | 6 | |
| Foil Detection | FOIL nocaps (Overall) | FDR18.6 | 6 |