Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Noisy Correspondence Learning with Meta Similarity Correction

About

Despite the success of multimodal learning in cross-modal retrieval task, the remarkable progress relies on the correct correspondence among multimedia data. However, collecting such ideal data is expensive and time-consuming. In practice, most widely used datasets are harvested from the Internet and inevitably contain mismatched pairs. Training on such noisy correspondence datasets causes performance degradation because the cross-modal retrieval methods can wrongly enforce the mismatched data to be similar. To tackle this problem, we propose a Meta Similarity Correction Network (MSCN) to provide reliable similarity scores. We view a binary classification task as the meta-process that encourages the MSCN to learn discrimination from positive and negative meta-data. To further alleviate the influence of noise, we design an effective data purification strategy using meta-data as prior knowledge to remove the noisy samples. Extensive experiments are conducted to demonstrate the strengths of our method in both synthetic and real-world noises, including Flickr30K, MS-COCO, and Conceptual Captions.

Haochen Han, Kaiyao Miao, Qinghua Zheng, Minnan Luo• 2023

Related benchmarks

TaskDatasetResultRank
Text-to-Image RetrievalFlickr30K
R@159.6
531
Text-to-Image RetrievalFlickr30k (test)
Recall@159.6
445
Image-to-Text RetrievalFlickr30K
R@177.4
429
Image-to-Text RetrievalFlickr30k (test)
R@177.4
392
Text-to-Image RetrievalMS-COCO
R@164.3
151
Image-to-Text RetrievalMS-COCO
R@178.1
132
Image-to-Text RetrievalMS-COCO 1K (test)
R@178.1
121
Text to ImageMS-COCO 1K (test)
R@164.3
53
Text-to-Image RetrievalMS COCO 1K
R@164.3
51
Image-to-Text RetrievalCC152K
R@140.1
48
Showing 10 of 18 rows

Other info

Code

Follow for update