Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Low-Rank Similarity Mining for Multimodal Dataset Distillation

About

Though dataset distillation has witnessed rapid development in recent years, the distillation of multimodal data, e.g., image-text pairs, poses unique and under-explored challenges. Unlike unimodal data, image-text contrastive learning (ITC) data lack inherent categorization and should instead place greater emphasis on modality correspondence. In this work, we propose Low-Rank Similarity Mining (LoRS) for multimodal dataset distillation, that concurrently distills a ground truth similarity matrix with image-text pairs, and leverages low-rank factorization for efficiency and scalability. The proposed approach brings significant improvement to the existing algorithms, marking a significant contribution to the field of visual-language dataset distillation. We advocate adopting LoRS as a foundational synthetic data setup for image-text dataset distillation. Our code is available at https://github.com/silicx/LoRS_Distill.

Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, Yong-Lu Li• 2024

Related benchmarks

TaskDatasetResultRank
Text-to-Image RetrievalFlickr30K--
559
Text-to-Image RetrievalFlickr30k (test)
Recall@110.3
525
Image-to-Text RetrievalFlickr30k (test)
R@114.9
472
Text-to-Video RetrievalDiDeMo (test)
R@128.8
407
Image RetrievalFlickr30k (test)
R@15.3
213
Image-to-Text RetrievalMS-COCO (test)
R@15.7
127
Text RetrievalFlickr30K
R@114.9
120
Video-to-Text retrievalDiDeMo (test)
R@130.3
111
Text RetrievalFlickr30k (test)
R@15.7
107
Text-to-Image RetrievalMS-COCO (test)
R@13
82
Showing 10 of 28 rows

Other info

Follow for update