Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation
About
Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cross-document multi-hop questions: (1) poor distribution adaptability, where $k$-means clustering introduces noise due to rigid distribution assumptions; (2) structural isolation, as tree indexes lack explicit cross-document connections; and (3) coarse abstraction, which obscures fine-grained details. To address these limitations, we propose $\Psi$-RAG, a tree-RAG framework with two key components. First, a hierarchical abstract tree index built through an iterative "merging and collapse" process that adapts to data distributions without a priori assumption. Second, a multi-granular retrieval agent that intelligently interacts with the knowledge base with reorganized queries and an agent-powered hybrid retriever. $\Psi$-RAG supports diverse tasks from token-level question answering to document-level summarization. On cross-document multi-hop QA benchmarks, it outperforms RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1 score. Code is available at https://github.com/Newiz430/Psi-RAG.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | 2Wiki | EM69.1 | 241 | |
| Question Answering | HotpotQA | EM62.2 | 173 | |
| Question Answering | PopQA | EM46.7 | 98 | |
| Retrieval | HotpotQA | R@596 | 68 | |
| Question Answering | NQ | EM50.6 | 45 | |
| Retrieval | 2Wiki | Recall@596.13 | 42 | |
| Question Answering | MuSiQue | EM38.7 | 38 | |
| Question Answering | Multihop-RAG | Exact Match (EM)55.3 | 22 | |
| Retrieval | NQ | Recall@246.08 | 9 | |
| Retrieval | PopQA | Recall@243.35 | 9 |