Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Core-based Hierarchies for Efficient GraphRAG

About

Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge. However, existing vector-based methods often fail on global sensemaking tasks that require reasoning across many documents. GraphRAG addresses this by organizing documents into a knowledge graph with hierarchical communities that can be recursively summarized. Current GraphRAG approaches rely on Leiden clustering for community detection, but we prove that on sparse knowledge graphs, where average degree is constant and most nodes have low degree, modularity optimization admits exponentially many near-optimal partitions, making Leiden-based communities inherently non-reproducible. To address this, we propose replacing Leiden with k-core decomposition, which yields a deterministic, density-aware hierarchy in linear time. We introduce a set of lightweight heuristics that leverage the k-core hierarchy to construct size-bounded, connectivity-preserving communities for retrieval and summarization, along with a token-budget-aware sampling strategy that reduces LLM costs. We evaluate our methods on real-world datasets including financial earnings transcripts, news articles, and podcasts, using three LLMs for answer generation and five independent LLM judges for head-to-head evaluation. Across datasets and models, our approach consistently improves answer comprehensiveness and diversity while reducing token usage, demonstrating that k-core-based GraphRAG is an effective and efficient framework for global sensemaking.

Jakir Hossain, Ahmet Erdem Sar{\i}y\"uce• 2026

Related benchmarks

TaskDatasetResultRank
Community Summary Evaluationpodcast C2 (post-cutoff)
Comprehensiveness Win58
6
Community Summary Evaluationpodcast C3 (post-cutoff)
Comprehensiveness Win53
6
Community Summary Evaluationnews C2 (post-cutoff)
Comprehensiveness Win56
6
Community Summary Evaluationnews C3 (post-cutoff)
Comprehensiveness Win56
6
Community Summary Evaluationsemiconductor C2 (post-cutoff)
Comprehensiveness Win60
6
Community Summary Evaluationsemiconductor C3 (post-cutoff)
Comprehensiveness Win Count67
6
Comprehensivenesspodcast Leiden level C0
Win Rate64
6
Comprehensivenesspodcast Leiden level C1
Win Rate (C1 Podcast)68
6
Comprehensivenessnews Leiden level C0
Win Rate (C0)56
6
Comprehensivenessnews Leiden level C1
Win Rate61
6
Showing 10 of 36 rows

Other info

Follow for update