Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

More Than Efficiency: Embedding Compression Improves Domain Adaptation in Dense Retrieval

About

Dense retrievers powered by pretrained embeddings are widely used for document retrieval but struggle in specialized domains due to the mismatches between the training and target domain distributions. Domain adaptation typically requires costly annotation and retraining of query-document pairs. In this work, we revisit an overlooked alternative: applying PCA to domain embeddings to derive lower-dimensional representations that preserve domain-relevant features while discarding non-discriminative components. Though traditionally used for efficiency, we demonstrate that this simple embedding compression can effectively improve retrieval performance. Evaluated across 9 retrievers and 14 MTEB datasets, PCA applied solely to query embeddings improves NDCG@10 in 75.4% of model-dataset pairs, offering a simple and lightweight method for domain adaptation.

Chunsheng Zuo, Daniel Khashabi• 2026

Related benchmarks

TaskDatasetResultRank
Dense RetrievalSCIDOCS
Relative Improvement (%)12.1
4
Dense RetrievalArguAna
Relative Improvement (%)10.3
4
Dense RetrievalFiQA
Relative Improvement14.3
4
Dense RetrievalNFCorpus
Relative Improvement1.9
4
Dense RetrievalSciFact
Relative Improvement (%)3.2
4
Showing 5 of 5 rows

Other info

Follow for update