Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Repetition Improves Language Model Embeddings

About

Bidirectional models are considered essential for strong text embeddings. Recent approaches to adapt autoregressive language models (LMs) into strong text embedding models have largely had the requirement to modify the LM architecture to be bidirectional. We challenge this premise by introducing "echo embeddings" which converts autoregressive LMs into high quality text embedding models without changing the architecture or requiring fine-tuning. By repeating the input and extracting embeddings from the repeated tokens -- which have access to all original tokens -- echo embeddings improve over classical LM embeddings by over 5% in zero-shot settings. Our zero-shot embeddings nearly match those obtained by bidirectionally-converted LMs that undergo additional masked-language modeling training. Echo embeddings are also compatible with supervised fine-tuning, matching or outperforming bidirectionally-converted LMs in an apples-to-apples comparison, even with an identical compute budget during training and inference. Overall, repetition is a simple and effective strategy to circumvent the need for bidirectional attention in embedding models, paving the way towards a unified architecture for all NLP tasks.

Jacob Mitchell Springer, Suhas Kotha, Daniel Fried, Graham Neubig, Aditi Raghunathan• 2024

Related benchmarks

TaskDatasetResultRank
Semantic Textual SimilaritySTS tasks (STS12, STS13, STS14, STS15, STS16, STS-B, SICK-R) various (test)
STS12 Score59.36
393
Semantic Textual SimilaritySTS tasks (STS12, STS13, STS14, STS15, STS16, STS-B, SICK-R)
STS12 Score52.4
195
Semantic Textual SimilaritySTS (Semantic Textual Similarity) 2012-2016 (test)
STS-12 Score50.43
57
Text EmbeddingMTEB
MTEB Score64.68
45
ClusteringMTEB Clustering
Bior Score22.94
23
RetrievalLoCo 1024 tokens V1
NDCG@100.1128
12
RetrievalLoCo 2048 tokens V1
NDCG@100.0658
12
RetrievalLoCo 4096 tokens V1
NDCG@1013.18
12
Clustering20 Newsgroups--
5
ClusteringBiorxivClustering
V-Measure25.92
3
Showing 10 of 11 rows

Other info

Follow for update