No-Worse Context-Aware Decoding: Preventing Neutral Regression in Context-Conditioned Generation

About

Large language models (LLMs) can answer questions and summarize documents when conditioned on external contexts (e.g., retrieved evidence), yet context use remains unreliable: models may overwrite an already-correct output (neutral regression) even when the context is non-informative. We formalize neutral regression as a do-no-harm requirement and quantify it by measuring accuracy drops on baseline-correct items under answer-consistent contexts. We propose No-Worse Context-Aware Decoding (NWCAD), a decode-time adapter built on a two-stream setup with a two-stage gate: it backs off to no-context decoding when the context is non-informative, and otherwise uses context-conditioned decoding with a CAD-style fallback under uncertainty. We evaluate NWCAD on benchmarks that separate do-no-harm reliability from context utilization (accuracy gains on genuinely helpful contexts). NWCAD prevents neutral regression on baseline-correct items while preserving strong context-driven accuracy on helpful contexts.

Yufei Tao, Ameeta Agrawal• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	PopQA	Accuracy87.12	158
Table Question Answering	TabMWP	Accuracy63.6	97
Question Answering	NQ-Open (val)	Accuracy49.62	46
Question Answering	NQ-Swap	Accuracy73	38
Question Answering	TabMWP	Accuracy54.2	26
Dialogue Summarization	TofuEval	ToFuEval Score83.12	18
Long-form Question Answering	ExpertQA	ROUGE-L23.34	18
Question Answering	Restate hard	Accuracy94.4	18
Question Answering	Distractor hard	Accuracy (Distractor hard)62.2	18
Question Answering	HELPFUL	Accuracy90.21	18

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord