In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

About

In-context learning has recently been linked to implicit gradient descent in linear self-attention models, suggesting that context can induce a forward-pass update. Retrieval-augmented generation (RAG) also relies on context, but retrieved documents are usually treated as static evidence rather than signals for adaptation. We study RAG as an in-context optimization process. First, we show that one linear self-attention layer can implement one gradient-descent step on a unified linearized RAG objective covering both projection-based and dot-product retrieval interfaces. This gives an exact regime where retrieval-augmented prediction and in-context optimization coincide. We use this result not as a literal model of LLM computation, but as a guide for adapting the interaction between queries and retrieved evidence. We then test the boundary of this correspondence: it remains stable under controlled linear extensions, but becomes feature-distribution dependent under nonlinear architectures. Finally, we turn this view into a lightweight method for frozen RAG LLMs. The method keeps the retriever and backbone fixed, and predicts a context-conditioned update to a generator-side evidence-use interface. Across seven QA benchmarks, two retrievers, and two frozen LLM backbones, this forward-only update improves a shared-interface baseline, transfers to held-out tasks, and approaches test-time gradient adaptation at much lower per-query cost.

Mingchen Li, Jiatan Huang, Chuxu Zhang, Liang Zhao, Hong Yu• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	Bamboogle	EM32	227
Question Answering	MuSiQue	F1 Score23.47	80
Question Answering	2WikiMultihopQA	Exact Match34.15	50
Question Answering	Natural Questions (NQ)	Exact Match (EM)45.68	32
Question Answering	Average 7 QA benchmarks	EM40.54	14

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord