Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SAGE: Structure Aware Graph Expansion for Retrieval of Heterogeneous Data

About

Retrieval-augmented question answering over heterogeneous corpora requires connected evidence across text, tables, and graph nodes. While entity-level knowledge graphs support structured access, they are costly to construct and maintain, and inefficient to traverse at query time. In contrast, standard retriever-reader pipelines use flat similarity search over independently chunked text, missing multi-hop evidence chains across modalities. We propose SAGE (Structure Aware Graph Expansion) framework that (i) constructs a chunk-level graph offline using metadata-driven similarities with percentile-based pruning, and (ii) performs online retrieval by running an initial baseline retriever to obtain k seed chunks, expanding first-hop neighbors, and then filtering the neighbors using dense+sparse retrieval, selecting k' additional chunks. We instantiate the initial retriever using hybrid dense+sparse retrieval for implicit cross-modal corpora and SPARK (Structure Aware Planning Agent for Retrieval over Knowledge Graphs) an agentic retriever for explicit schema graphs. On OTT-QA and STaRK, SAGE improves retrieval recall by 5.7 and 8.5 points over baselines.

Prasham Titiya, Rohit Khoja, Tomer Wolfson, Vivek Gupta, Dan Roth• 2026

Related benchmarks

TaskDatasetResultRank
Knowledge Graph RetrievalSTaRK-Amazon 1.0 (Human)
Hits@145.1
32
RetrievalSTaRK AMAZON Synthetic
Recall@2058.28
20
RetrievalSTaRK MAG Synthetic
Recall@2070.1
20
RetrievalSTaRK PRIME Synthetic
Recall@2060.98
20
Knowledge Graph RetrievalSTaRK-Prime Synthetic 1.0
Hits@10.3086
20
Knowledge Graph RetrievalSTaRK-MAG Synthetic 1.0
Hits@143.2
20
Knowledge Graph RetrievalSTaRK-Amazon Synthetic 1.0
Hits@128.41
20
RetrievalSTaRK MAG Human
Recall@2056.4
16
RetrievalSTaRK PRIME Human
Recall@2066.37
16
Knowledge Graph RetrievalSTaRK-Prime 1.0 (Human)
Hits@10.422
16
Showing 10 of 11 rows

Other info

Follow for update