Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

About

In-context learning has recently been linked to implicit gradient descent in linear self-attention models, suggesting that context can induce a forward-pass update. Retrieval-augmented generation (RAG) also relies on context, but retrieved documents are usually treated as static evidence rather than signals for adaptation. We study RAG as an in-context optimization process. First, we show that one linear self-attention layer can implement one gradient-descent step on a unified linearized RAG objective covering both projection-based and dot-product retrieval interfaces. This gives an exact regime where retrieval-augmented prediction and in-context optimization coincide. We use this result not as a literal model of LLM computation, but as a guide for adapting the interaction between queries and retrieved evidence. We then test the boundary of this correspondence: it remains stable under controlled linear extensions, but becomes feature-distribution dependent under nonlinear architectures. Finally, we turn this view into a lightweight method for frozen RAG LLMs. The method keeps the retriever and backbone fixed, and predicts a context-conditioned update to a generator-side evidence-use interface. Across seven QA benchmarks, two retrievers, and two frozen LLM backbones, this forward-only update improves a shared-interface baseline, transfers to held-out tasks, and approaches test-time gradient adaptation at much lower per-query cost.

Mingchen Li, Jiatan Huang, Chuxu Zhang, Liang Zhao, Hong Yu• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringBamboogle
EM32
227
Question AnsweringMuSiQue
F1 Score23.47
80
Question Answering2WikiMultihopQA
Exact Match34.15
50
Question AnsweringNatural Questions (NQ)
Exact Match (EM)45.68
32
Question AnsweringAverage 7 QA benchmarks
EM40.54
14
Showing 5 of 5 rows

Other info

Follow for update