Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning
About
Cosine similarity is prevalent in contrastive learning, yet it assumes embedding magnitude is noise. We systematically study magnitude learning through a framework that independently controls query-side and document-side normalization. First, magnitude learning benefits retrieval and Retrieval-Augmented Generation (RAG) where queries and documents have distinct roles, but not Semantic Textual Similarity (STS) or CLIP where inputs are interchangeable. Second, query and document magnitudes serve different roles: document magnitude scales inference scores, while query magnitude modulates training gradients. Normalizing one side consistently outperforms both sides, and the Fisher Information Matrix condition number predicts which side to normalize. Third, magnitude learning improves out-of-domain generalization more than in-domain performance, with gains up to +72\% vs +7\%, requiring retrieval-specialized pre-training or sufficient data. These findings provide practical guidance for retrieval and RAG across text and vision domains.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Information Retrieval | BEIR | Average NDCG@100.4401 | 62 | |
| Information Retrieval | TREC DL 19 | nDCG@1060.43 | 61 | |
| Information Retrieval | TREC DL20 | NDCG@1059.69 | 50 | |
| Retrieval | BRIGHT 12 datasets aggregate (test) | NDCG@1012.74 | 20 | |
| Question Answering | HotpotQA (test) | EM32.7 | 18 | |
| Information Retrieval | MS MARCO (dev) | -- | 15 | |
| Information Retrieval | Multi-hop | NDCG@1058.16 | 12 | |
| Open-domain Question Answering | NQ 3.5K (test) | EM0.261 | 5 | |
| Open-domain Question Answering | TriviaQA 11.3K (test) | EM40.2 | 5 |