Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models

About

Large language models (LLMs) often struggle to accurately read and comprehend extremely long texts. Current methods for improvement typically rely on splitting long contexts into fixed-length chunks. However, fixed truncation risks separating semantically relevant content, leading to ambiguity and compromising accurate understanding. To overcome this limitation, we propose a straightforward approach for dynamically separating and selecting chunks of long context, facilitating a more streamlined input for LLMs. In particular, we compute semantic similarities between adjacent sentences, using lower similarities to adaptively divide long contexts into variable-length chunks. We further train a question-aware classifier to select sensitive chunks that are critical for answering specific questions. Experimental results on both single-hop and multi-hop question-answering benchmarks show that the proposed approach consistently outperforms strong baselines. Notably, it maintains robustness across a wide range of input lengths, handling sequences of up to 256k tokens. Our datasets and code are available at the following link: https://github.com/ECNU-Text-Computing/DCS

Boheng Sheng, Jiacheng Yao, Meicong Zhang, Guoxiu He• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA--
221
Multi-hop Question Answering2WikiMQA--
154
Abstractive SummarizationMulti-News--
47
Single-hop Question AnsweringMFQA en
Score45.83
22
Single-hop Question AnsweringNarrativeQA
Score23.89
22
Single-hop Question AnsweringMFQA en 16k
Overall Score23.76
22
Single-hop Question AnsweringQasper
Score44.59
22
Single-hop Question AnsweringLoogle SD
Score45.1
17
Single-hop Question AnsweringFactrecall en
Score29.89
17
Code GenerationRepoBench-P
Score15.04
5
Showing 10 of 20 rows

Other info

Code

Follow for update