Retrieval from Within: An Intrinsic Capability of Attention-Based Models

About

Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.

Elad Hoffer, Yochai Blau, Edan Kinderman, Ron Banner, Daniel Soudry, Boris Ginsburg• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	2Wiki	--	260
Question Answering	NQ (test)	EM Accuracy51.2	143
Question Answering	MuSiQue (test)	EM14	85
Question Answering	MuSiQue	F1 Score23	79
Retrieval	HotpotQA	R@559.9	68
Question Answering	2Wiki (test)	EM Accuracy49.2	59
End-to-end Open-Domain Question Answering	NQ (test)	Exact Match (EM)51.2	59
Question Answering	HotpotQA	F158	21
End-to-end Question Answering	HotpotQA (test)	EM0.464	9
End-to-end Question Answering	MuSiQue (test)	EM14	9

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord