The Scientific Contribution Graph: Automated Literature-based Technological Roadmapping at Scale
About
Scientific contributions rarely develop in isolation, but instead build upon prior discoveries. We formulate the task of automated technological roadmapping as extracting scientific contributions from scholarly articles and linking them to their prerequisites. We present the Scientific Contribution Graph, a large-scale AI/NLP-domain resource containing 2 million detailed scientific contributions extracted from 230k open-access papers and connected by 12.5 million prerequisite edges. We further introduce scientific prerequisite prediction, a scientific discovery task in which models predict which existing technologies can enable future discoveries, and show that contemporary models are rapidly improving on this task, reaching 0.48 MAP when evaluated using temporally filtered backtesting. We anticipate technological roadmapping resources such as this will support scientific impact assessment and automated scientific discovery.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Technological requirement identification | Scientific Contribution Graph 1.0 (entire set) | -- | 10 | |
| Technological requirement identification | Scientific Contribution Graph 1.0 (pre-cutoff) | -- | 9 | |
| Technological requirement identification | Scientific Contribution Graph 1.0 (post-cutoff) | -- | 9 | |
| Seq2Seq contribution generation on full text | SCI. CONT. GRAPH | Number of Nodes2 | 1 | |
| Span/relation labeling | SciERC | -- | 1 | |
| Span/relation labeling | SciREX | -- | 1 | |
| Span/relation labeling | NLPCONT | -- | 1 | |
| Span/relation labeling | SCINLP-KG | -- | 1 | |
| Span/relation labeling | SCICLAIM | -- | 1 | |
| Span/relation labeling | CS-KG V2 | -- | 1 |