Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FOIL

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionFOIL
Accuracy (4 Refs)98.4
32
Image Captioning Hallucination DetectionFOIL (test)
Accuracy95.4
28
Image-Text MatchingFoil
AURC0.245
23
Hallucination DetectionFOIL
Accuracy92.6
18
Image Caption EvaluationFOIL (4-ref)
Accuracy94.64
15
Image Caption EvaluationFOIL 1-ref
Accuracy90.94
15
Visual ReasoningFOIL 30K captions derived from COCO
Object Accuracy84.3
10
Object Hallucination DetectionFOIL (test)
Accuracy98.4
9
Hallucination DetectionFOIL 4-ref
Accuracy98.4
6
Hallucination DetectionFOIL (1-ref)
Accuracy98.2
6
Image Captioning EvaluationFOIL
Accuracy (1-ref)96.7
6
Foil DetectionFOIL-nocaps (Out of Domain)
FDR80.2
6
Foil DetectionFOIL nocaps (In Domain)
FDR78.9
6
Foil DetectionFOIL nocaps (Overall)
FDR18.6
6
Showing 14 of 14 rows