KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
About
Recent interest in building foundation models for KGs has highlighted a fundamental challenge: knowledge-graph data is relatively scarce. The best-known KGs are primarily human-labeled, created by pattern-matching, or extracted using early NLP techniques. While human-generated KGs are in short supply, automatically extracted KGs are of questionable quality. We present a solution to this data scarcity problem in the form of a text-to-KG generator (KGGen), a package that uses language models to create high-quality graphs from plaintext. Unlike other KG extractors, KGGen clusters related entities to reduce sparsity in extracted KGs. KGGen is available as a Python library (\texttt{pip install kg-gen}), making it accessible to everyone. Along with KGGen, we release the first benchmark, Measure of of Information in Nodes and Edges (MINE), that tests an extractor's ability to produce a useful KG from plain text. We benchmark our new tool against existing extractors and demonstrate far superior performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fact Verification | MINE | Accuracy75.4 | 28 | |
| Knowledge Graph Extraction | MINE benchmark 1.0 (100 articles) (test) | NFI0.187 | 25 | |
| Knowledge Graph Construction | MINE benchmark 100 articles | Mean Node Count57.2 | 25 | |
| Factual Retention | MINE | Factual Retention (%)69.1 | 25 | |
| Knowledge Graph Extraction | NLP | Node Significance53.57 | 15 | |
| Knowledge Graph Extraction | SQL | Node Significance58.12 | 15 | |
| Knowledge Graph Extraction | Algorithms | Node Significance52.99 | 15 | |
| Knowledge Graph Information Retention | MINE-1 | MINE-1 Score73 | 6 |