Retrieval from Within: An Intrinsic Capability of Attention-Based Models
About
Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | 2Wiki | -- | 241 | |
| Question Answering | NQ (test) | EM Accuracy51.2 | 133 | |
| Question Answering | MuSiQue | F1 Score23 | 79 | |
| Question Answering | MuSiQue (test) | EM14 | 76 | |
| Retrieval | HotpotQA | R@559.9 | 68 | |
| End-to-end Open-Domain Question Answering | NQ (test) | Exact Match (EM)51.2 | 59 | |
| Question Answering | 2Wiki (test) | EM Accuracy49.2 | 49 | |
| Question Answering | HotpotQA | F158 | 21 | |
| End-to-end Question Answering | HotpotQA (test) | EM0.464 | 9 | |
| End-to-end Question Answering | MuSiQue (test) | EM14 | 9 |