Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

About

Existing long-document question answering systems typically process texts as flat sequences or use heuristic chunking, which overlook the discourse structures that naturally guide human comprehension. We present a discourse-aware hierarchical framework that leverages rhetorical structure theory (RST) for long document question answering. Our approach converts discourse trees into sentence-level representations and employs LLM-enhanced node representations to bridge structural and semantic information. The framework involves three key innovations: language-universal discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval. Extensive experiments on four datasets demonstrate consistent improvements over existing approaches through the incorporation of discourse structure, across multiple genres and languages. Moreover, the proposed framework exhibits strong robustness across diverse document types and linguistic settings.

Huiyao Chen, Yi Yang, Yinghui Li, Meishan Zhang, Baotian Hu, Min Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringQASPER (test)
F1 Score (Match)50.2
132
Question AnsweringQuALITY (test)
Accuracy77.71
90
Question AnsweringMultiFieldQA-zh
F1 Score35.25
30
RetrievalQASPER (test)
F1 Score30.27
30
Question AnsweringNarrativeQA
BLEU25.39
5
Showing 5 of 5 rows

Other info

Follow for update