Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Attention Sorting Combats Recency Bias In Long Context Language Models

About

Current language models often fail to incorporate long contexts efficiently during generation. We show that a major contributor to this issue are attention priors that are likely learned during pre-training: relevant information located earlier in context is attended to less on average. Yet even when models fail to use the information from a relevant document in their response, they still pay preferential attention to that document compared to an irrelevant document at the same position. We leverage this fact to introduce ``attention sorting'': perform one step of decoding, sort documents by the attention they receive (highest attention going last), repeat the process, generate the answer with the newly sorted context. We find that attention sorting improves performance of long context models. Our findings highlight some challenges in using off-the-shelf language models for retrieval augmented generation.

Alexander Peysakhovich, Adam Lerer• 2023

Related benchmarks

TaskDatasetResultRank
Question AnsweringNQ (test)--
66
Retrieval-Augmented GenerationNaturalQuestion 20 documents
Average Score0.6289
12
Retrieval-Augmented GenerationNaturalQuestion 10 documents
Average Score65.06
12
Retrieval-Augmented GenerationSynthWiki 10 documents
Average Score93.19
12
Retrieval-Augmented GenerationSynthWiki 20 documents
Mean Score94.2
12
Open-domain Question AnsweringWebQA (test)
Accuracy52.1
5
Showing 6 of 6 rows

Other info

Follow for update