Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems
About
Graph-based retrieval-augmented generation (GraphRAG) systems construct knowledge graphs over document collections to support multi-hop reasoning. While prior work shows that GraphRAG responses may leak retrieved subgraphs, the feasibility of query-efficient reconstruction of the hidden graph structure remains unexplored under realistic query budgets. We study a budget-constrained black-box setting where an adversary adaptively queries the system to steal its latent entity-relation graph. We propose AGEA (Agentic Graph Extraction Attack), a framework that leverages a novelty-guided exploration-exploitation strategy, external graph memory modules, and a two-stage graph extraction pipeline combining lightweight discovery with LLM-based filtering. We evaluate AGEA on medical, agriculture, and literary datasets across Microsoft-GraphRAG and LightRAG systems. Under identical query budgets, AGEA significantly outperforms prior attack baselines, recovering up to 90% of entities and relationships while maintaining high precision. These results demonstrate that modern GraphRAG systems are highly vulnerable to structured, agentic extraction attacks, even under strict query limits.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Graph Extraction Attack | M-GraphRAG Medical 1.0 (test) | Leak (Nodes)87.09 | 10 | |
| Importance-based Node Leakage | medical | Leakage (Deg)95.4 | 10 | |
| Importance-based Node Leakage | Agriculture | Leakage (Degree)92.7 | 10 | |
| Graph Extraction Attack | Agriculture M-GraphRAG 1.0 (test) | Leakage (N)84.67 | 5 | |
| Graph Extraction Attack | Agriculture LightRAG 1.0 (test) | Leakage (N)88.05 | 5 |