UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels

About

Contrastive objectives power state-of-the-art multimodal models, but their training remains slow, relying on long stochastic optimization. We propose a Unified Framework for Efficient Contrastive Alignment via Kernels (UniCon), which spans linear and nonlinear encoders as well as one-to-one and many-to-many alignments. At its core, UniCon introduces the contrastive similarity weight matrix $S(\gamma)$, which enables closed-form global solutions that provably replace minibatch back-propagation with exact updates. Through the lens of reproducing kernel Hilbert spaces (RKHS), UniCon provides a kernelized perspective that unifies contrastive alignment and reveals its connection to spectral methods. To validate the theory, we conduct experiments on synthetic, unimodal, multimodal, and zero-shot tasks, demonstrating that UniCon achieves substantial efficiency gains while preserving generality and strong empirical performance.

Hangke Sui, Yuqing Wang, Minh N Do• 2026

Related benchmarks

Task	Dataset	Result
Text-to-Image Retrieval	Flickr30K	R@142.1	607
Text-to-Image Retrieval	Flickr30k (test)	--	528
Image-to-Text Retrieval	Flickr30k (test)	--	472
Text-to-Image Retrieval	MSCOCO 5K (test)	R@129.2	312
Image-to-Text Retrieval	MSCOCO 5K (test)	R@132.9	68
Text-to-Image Retrieval	MSCOCO (5K)	R@129.2	51
Audio-to-Text Retrieval	Clotho	R@13.35	49
Image-Text Retrieval	Flickr30k (test)	--	45
Image-to-Text Retrieval	MSCOCO (5K)	R@132.9	42
Text-to-Audio Retrieval	Clotho	R@10.0249	31

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord