Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models
About
Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored. In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain text by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets. On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention performance on the MINE-1 benchmark (86%), outperforming prior KG construction methods. Wikontic is also efficient at build time: KG construction uses less than 1,000 output tokens, about 3$\times$ fewer than AriGraph and $<$1/20 of GraphRAG. The proposed pipeline enhances the quality of the generated KG and offers a scalable solution for leveraging structured knowledge in LLMs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | HotpotQA | EM46.4 | 173 | |
| Question Answering | MuSiQue (test) | EM46.8 | 76 | |
| Question Answering | MuSiQue | EM25 | 38 | |
| Question Answering | HotpotQA (test) | EM64.5 | 18 | |
| Knowledge Graph Information Retention | MINE-1 | MINE-1 Score86 | 17 | |
| Question Answering | SpecsQA (test) | F1 (Factual Correctness)13.5 | 13 | |
| Question Answering | SpecsQA | FC F113.5 | 13 | |
| Knowledge Graph Construction | MuSiQue | Total Triples204 | 10 | |
| Knowledge Graph Construction | HotpotQA | Total Triples Count117 | 10 | |
| Knowledge Graph Extraction | HotpotQA | Avg Edge Multiplicity1.05 | 8 |