Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels

About

Contrastive objectives power state-of-the-art multimodal models, but their training remains slow, relying on long stochastic optimization. We propose a Unified Framework for Efficient Contrastive Alignment via Kernels (UniCon), which spans linear and nonlinear encoders as well as one-to-one and many-to-many alignments. At its core, UniCon introduces the contrastive similarity weight matrix $S(\gamma)$, which enables closed-form global solutions that provably replace minibatch back-propagation with exact updates. Through the lens of reproducing kernel Hilbert spaces (RKHS), UniCon provides a kernelized perspective that unifies contrastive alignment and reveals its connection to spectral methods. To validate the theory, we conduct experiments on synthetic, unimodal, multimodal, and zero-shot tasks, demonstrating that UniCon achieves substantial efficiency gains while preserving generality and strong empirical performance.

Hangke Sui, Yuqing Wang, Minh N Do• 2026

Related benchmarks

TaskDatasetResultRank
Text-to-Image RetrievalFlickr30K
R@142.1
559
Text-to-Image RetrievalFlickr30k (test)--
525
Image-to-Text RetrievalFlickr30k (test)--
472
Text-to-Image RetrievalMSCOCO 5K (test)
R@129.2
312
Image-to-Text RetrievalMSCOCO 5K (test)
R@132.9
68
Text-to-Image RetrievalMSCOCO (5K)
R@129.2
51
Audio-to-Text RetrievalClotho
R@13.35
49
Image-Text RetrievalFlickr30k (test)--
45
Image-to-Text RetrievalMSCOCO (5K)
R@132.9
42
Text-to-Audio RetrievalClotho
R@10.0249
31
Showing 10 of 13 rows

Other info

Follow for update