Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data

About

Leveraging multimodal data to drive breakthroughs in e-commerce applications through Multimodal Foundation Models (MFMs) is gaining increasing attention from the research community. However, there are significant challenges that hinder the optimal use of multimodal e-commerce data by foundation models: (1) the scarcity of large-scale, high-quality multimodal benchmark datasets; and (2) the lack of effective multimodal information integration methods. To address these challenges, in this paper, we introduce MMECInstruct, the first-ever, large-scale, and high-quality multimodal instruction dataset for e-commerce. We also develop CASLIE, a simple, lightweight, yet effective framework for integrating multimodal information for e-commerce. Leveraging MMECInstruct, we fine-tune a series of e-commerce MFMs within CASLIE, denoted as CASLIE models. Our comprehensive evaluation demonstrates that CASLIE models substantially outperform 5 categories of advanced baseline models in the in-domain evaluation. Moreover, CASLIE models show strong generalizability to out-of-domain settings. MMECInstruct and CASLIE models are publicly accessible through https://ninglab.github.io/CASLIE/.

Xinyi Ling, Hanwen Du, Bo Peng, Zhihui Zhu, Xia Ning• 2024

Related benchmarks

TaskDatasetResultRank
Image RetrievalFashion200k (test)
Recall@14.71
58
Multimodal Retrieval (text query to multimodal candidate)MBE 2.0
R@126.32
50
Multimodal RetrievalM5Product
Recall@18.4
30
Multimodal Retrieval (text query to multimodal content)M5Product (test)
Recall@18.4
26
ClassificationM5Product
Accuracy38.16
24
Product ClassificationFashion200k
Accuracy54.88
23
Image-to-Text RetrievalFashion200k
R@1013.89
18
Text-to-Image RetrievalFashion200k
Recall@1014.12
18
Multimodal Retrieval (image query to multimodal content)M5Product (test)
Recall@18
13
Multimodal Retrieval (q^i -> e^mm)MBE 3.0 1.0 (test)
Recall@19.02
13
Showing 10 of 19 rows

Other info

Follow for update