Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning

About

Cosine similarity is prevalent in contrastive learning, yet it assumes embedding magnitude is noise. We systematically study magnitude learning through a framework that independently controls query-side and document-side normalization. First, magnitude learning benefits retrieval and Retrieval-Augmented Generation (RAG) where queries and documents have distinct roles, but not Semantic Textual Similarity (STS) or CLIP where inputs are interchangeable. Second, query and document magnitudes serve different roles: document magnitude scales inference scores, while query magnitude modulates training gradients. Normalizing one side consistently outperforms both sides, and the Fisher Information Matrix condition number predicts which side to normalize. Third, magnitude learning improves out-of-domain generalization more than in-domain performance, with gains up to +72\% vs +7\%, requiring retrieval-specialized pre-training or sufficient data. These findings provide practical guidance for retrieval and RAG across text and vision domains.

Xincan Feng, Taro Watanabe• 2026

Related benchmarks

TaskDatasetResultRank
Information RetrievalBEIR
Average NDCG@100.4401
62
Information RetrievalTREC DL 19
nDCG@1060.43
61
Information RetrievalTREC DL20
NDCG@1059.69
50
RetrievalBRIGHT 12 datasets aggregate (test)
NDCG@1012.74
20
Question AnsweringHotpotQA (test)
EM32.7
18
Information RetrievalMS MARCO (dev)--
15
Information RetrievalMulti-hop
NDCG@1058.16
12
Open-domain Question AnsweringNQ 3.5K (test)
EM0.261
5
Open-domain Question AnsweringTriviaQA 11.3K (test)
EM40.2
5
Showing 9 of 9 rows

Other info

Follow for update