Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

About

Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.

Elad Hoffer, Yochai Blau, Edan Kinderman, Ron Banner, Daniel Soudry, Boris Ginsburg• 2026

Related benchmarks

TaskDatasetResultRank
Question Answering2Wiki--
241
Question AnsweringNQ (test)
EM Accuracy51.2
133
Question AnsweringMuSiQue
F1 Score23
79
Question AnsweringMuSiQue (test)
EM14
76
RetrievalHotpotQA
R@559.9
68
End-to-end Open-Domain Question AnsweringNQ (test)
Exact Match (EM)51.2
59
Question Answering2Wiki (test)
EM Accuracy49.2
49
Question AnsweringHotpotQA
F158
21
End-to-end Question AnsweringHotpotQA (test)
EM0.464
9
End-to-end Question AnsweringMuSiQue (test)
EM14
9
Showing 10 of 13 rows

Other info

Follow for update