A Multi-Task Embedder For Retrieval Augmented LLMs
About
LLMs confront inherent limitations in terms of its knowledge, memory, and action. The retrieval augmentation stands as a vital mechanism to address these limitations, which brings in useful information from external sources to augment the LLM. However, existing retrieval methods encounter two pressing issues. On one hand, the general retrievers are not properly optimized for retrieval augmentation hence exhibit limited effectiveness; on the other hand, the task-specific retrievers excel in the targeted retrieval augmentation scenario, while lack the versatility to handle diverse scenarios. In this work, we propose \textbf{LLM-Embedder} for the unified support of diverse retrieval augmentation scenarios. Our method presents three technical contributions. Firstly, we introduce a new \textit{reward formulation}, namely {rank-aware reward}. It exploits the ranking position of the desired output among $N$ sampled outputs from the LLM, which leads to fine-grained and robust computation of reward from the LLM's feedback. Secondly, we design a novel \textit{distillation objective}, called graded distillation. It incorporates both the absolute value and the relative order of the reward for more sufficient utilization of the LLM's feedback. Thirdly, we systematically optimize the \textit{multi-task learning}, which effectively unifies the multiple retrieval functionalities into one model. In our experiment, LLM-Embedder notably improves the LLM's performances in various downstream tasks, and outperforms both general and task-specific retrievers with a substantial advantage.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | 2WikiMultihopQA | EM45.72 | 278 | |
| Multi-hop Question Answering | MuSiQue | EM18.36 | 106 | |
| Multi-hop Question Answering | Bamboogle | Exact Match40.8 | 97 | |
| Multi-hop Question Answering | HotpotQA | Exact Match (EM)41.39 | 56 | |
| General Question Answering | TriviaQA | Exact Match62.33 | 39 | |
| General Question Answering | PopQA | EM42.69 | 36 | |
| General Question Answering | NQ | Exact Match (EM)41.32 | 36 | |
| Question Answering | Combined 7 Datasets | Average Score39.82 | 18 |