Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge

About

Knowledge conflict arises from discrepancies between information in the context of a large language model (LLM) and the knowledge stored in its parameters. This can hurt performance when using standard decoding techniques, which tend to ignore the context. Existing test-time contrastive methods seek to address this by comparing the LLM's output distribution with and without the context and adjust the model according to the contrast between them. However, we find that these methods frequently misjudge the degree of conflict and struggle to handle instances that vary in their amount of conflict, with static methods over-adjusting when conflict is absent. We propose a fine-grained, instance-level approach called AdaCAD, which dynamically infers the weight of adjustment based on the degree of conflict, as measured by the Jensen-Shannon divergence between distributions representing contextual and parametric knowledge. Across four LLMs, six question-answering (QA) and three summarization datasets, we demonstrate that ADACAD consistently outperforms other decoding baselines with average QA accuracy gains of 14.21% (absolute) over a static contrastive baseline, and improves the factuality of summaries by 6.19 (AlignScore). Lastly, we show that while contrastive baselines hurt performance when conflict is absent, ADACAD mitigates these losses, making it more applicable to real-world datasets in which some examples have conflict and others do not.

Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal• 2024

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringOK-VQA (test)
Accuracy67.01
296
Abstractive Text SummarizationCNN/Daily Mail (test)
ROUGE-L21.27
169
Question AnsweringTriviaQA
EM80.3
116
Visual Question AnsweringInfoSeek (test)
Accuracy52.04
60
Visual Question AnsweringE-VQA (test)
Accuracy75.65
56
Question AnsweringSQuAD
Exact Match81.87
50
Abstractive SummarizationXSum (test)
ROUGE-L15.81
44
Question AnsweringHotpotQA
Exact Match43.82
20
Question AnsweringTabMWP
EM60
20
Question AnsweringNQ
EM68.54
20
Showing 10 of 17 rows

Other info

Follow for update