Uncertainty-driven Embedding Convolution

About

Text embeddings are essential components in modern NLP pipelines. Although numerous embedding models have been proposed, no single model consistently dominates across domains and tasks. This variability motivates the use of ensemble techniques to combine complementary strengths. However, most existing ensemble methods operate on deterministic embeddings and fail to account for model-specific uncertainty, limiting their robustness and reliability in downstream applications. To address these limitations, we propose Uncertainty-driven Embedding Convolution (UEC). UEC first transforms deterministic embeddings into probabilistic ones in a post-hoc manner. It then computes adaptive ensemble coefficients based on embedding uncertainty, derived from a principled surrogate-loss formulation. Additionally, UEC employs an uncertainty-aware similarity function that directly incorporates uncertainty into the similarity scoring, providing a theoretically grounded and efficient surrogate to distributional distances. Extensive experiments on diverse benchmarks demonstrate that UEC consistently improves both performance and robustness by leveraging principled uncertainty modeling.

Sungjun Lim, Kangjun Noh, Youngjun Choi, Heeyoung Lee, Kyungwoo Song• 2025

Related benchmarks

Task	Dataset	Result
Retrieval	BelebeleRetrieval	nDCG@1095.82	26
Retrieval	SCIDOCS	nDCG@1024.01	18
Sentiment Analysis	Poem sentiment	Accuracy56.81	15
Retrieval	LegalBench CorporateLobbying	nDCG@1093.56	12
Retrieval	WikipediaRetrieval Multilingual	nDCG@1094.24	12
Retrieval	StackOverflowQA	nDCG@1090.26	12
Classification	FinancialPhrasebank	Accuracy83.02	11
Classification	MassiveIntentClassification	Accuracy77.08	11
Classification	TweetTopic Single Classification	Accuracy74.2	11
Semantic Textual Similarity	STS Benchmarks (STSB, FinPara, SICK-R, SemRel24, STS12, STS13, STS14, STS15, STS17, STS22) (test)	STSB Score87.55	11

Showing 10 of 30 rows

Other info

Follow for update

@wizwand_team Discord