Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

About

This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar embeddings. Specifically, we propose InBedder that instantiates this embed-via-answering idea by only fine-tuning language models on abstractive question answering tasks. InBedder demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to both large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large). Additionally, our qualitative analysis of clustering outcomes, achieved by applying different instructions to the same corpus, demonstrates a high degree of interpretability.

Letian Peng, Yuwei Zhang, Zilong Wang, Jayanth Srinivasa, Gaowen Liu, Zihan Wang, Jingbo Shang• 2024

Related benchmarks

TaskDatasetResultRank
Real-time latency evaluationAG-News
Latency (s)7
15
Latency EvaluationBig Patent
Latency (s)27
13
Instruction AwarenessInstructSTSB (test)
Spearman Correlation22.07
12
Instruction AwarenessIntEmo (test)
Spearman Correlation0.9107
12
Instruction AwarenessNYT (test)
Spearman Correlation64.65
12
Sentence EmbeddingMTEB Clustering standard (test)
AskU. Score60.32
12
STSBig Patent
Spearman Correlation0.3781
9
ClusteringNYTClust
V-Measure72.7
9
STSMultiHate
Spearman Correlation4.57e+3
9
Triplet AlignmentIntEmo
Accuracy92.64
9
Showing 10 of 17 rows

Other info

Code

Follow for update