Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal Representation Learning Conditioned on Semantic Relations

About

Multimodal representation learning has been largely driven by contrastive models such as CLIP, which learn a shared embedding space by aligning paired image-text samples. While effective for general-purpose representation learning, such models typically produce a single embedding per sample that is reused across different semantic relations and contexts. However, in many real-world applications, relevance between samples is inherently relation-dependent, with different semantic relations emphasizing different aspects of multimodal data. In this work, we propose Relation-Conditioned Multimodal Learning (RCML), a framework that treats semantic relations as explicit conditions of multimodal representation learning. Rather than producing relation-agnostic embeddings, RCML learns representations conditioned on natural-language relation descriptions, allowing the same sample to be represented differently under different relational contexts. The framework constructs relation-aware training pairs, introduces a relation-conditioned module to adapt embeddings to relation semantics, and employs a unified contrastive objective to jointly model cross-modal alignment and relation-induced inter-sample structure. Experiments on multiple datasets show that RCML consistently outperforms strong baselines on retrieval and classification tasks in zero-shot, fine-tuned, and out-of-domain settings, highlighting the effectiveness of leveraging semantic relations to guide multimodal representation learning.

Yang Qiao, Yuntong Hu, Bowen Zhu, Hasibul Haque, Liang Zhao• 2025

Related benchmarks

TaskDatasetResultRank
Relation-Conditioned RetrievalElec
Hit@549.32
30
Relation-Conditioned RetrievalAuto
Hit@544.38
30
Relation-Conditioned RetrievalOFFICE
Hit@549.64
30
Relation-Conditioned RetrievalBaby
Hit@544.65
30
Relation-Conditioned RetrievalPet
Hit@555.08
30
Relation-Conditioned Retrievalmusic
Hit@551.53
30
Relation-Conditioned RetrievalSports
Hit@564.81
30
Relation-Conditioned RetrievalGoodread
Hit@553.26
30
Relation-Conditioned RetrievalElec
MRR32.95
30
Relation-Conditioned RetrievalAuto
MRR30.34
30
Showing 10 of 15 rows

Other info

Follow for update