Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LEMoN: Label Error Detection using Multimodal Neighbors

About

Large repositories of image-caption pairs are essential for the development of vision-language models. However, these datasets are often extracted from noisy data scraped from the web, and contain many mislabeled instances. In order to improve the reliability of downstream models, it is important to identify and filter images with incorrect captions. However, beyond filtering based on image-caption embedding similarity, no prior works have proposed other methods to filter noisy multimodal data, or concretely assessed the impact of noisy captioning data on downstream training. In this work, we propose, theoretically justify, and empirically validate LEMoN, a method to identify label errors in image-caption datasets. Our method leverages the multimodal neighborhood of image-caption pairs in the latent space of contrastively pretrained multimodal models to automatically identify label errors. Through empirical evaluations across eight datasets and twelve baselines, we find that LEMoN outperforms the baselines by over 3% in label error detection, and that training on datasets filtered using our method improves downstream captioning performance by more than 2 BLEU points over noisy training.

Haoran Zhang, Aparna Balagopalan, Nassim Oufattole, Hyewon Jeong, Yan Wu, Jiacheng Zhu, Marzyeh Ghassemi• 2024

Related benchmarks

TaskDatasetResultRank
Mislabeled Data DetectionDeepDRiD
F1 Score69.55
55
Mislabeled Data DetectionISIC
F1 Score75.35
55
Mislabeled Data DetectionPanda
F1 Score75.58
55
Noisy label detectionCIFAR-100N natural label noise (train)
F1-score78.4
19
Mislabeled sample detectionCheXpert
F1 Score79.42
11
Showing 5 of 5 rows

Other info

Follow for update