Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SPECTER: Document-level Representation Learning using Citation-informed Transformers

About

Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level training objectives and do not leverage information on inter-document relatedness, which limits their document-level representation power. For applications on scientific documents, such as classification and recommendation, the embeddings power strong performance on end tasks. We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning. Additionally, to encourage further research on document-level models, we introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. We show that SPECTER outperforms a variety of competitive baselines on the benchmark.

Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld• 2020

Related benchmarks

TaskDatasetResultRank
Cross-Corpus RankingCross-Corpus Dataset
Avg. RFR1.23
20
Citation RecommendationScientific Paper Domains Natural science (test)
P@30.542
20
Citation RecommendationScientific Paper Domains Social science (test)
P@30.62
20
Citation RecommendationScientific Paper Domains Overall (test)
Precision@354.5
20
Category RetrievalAmazon Economics (test)
Recall@5031.26
15
Category RetrievalMathematics Amazon (test)
R@5023.86
15
Category RetrievalGeology Amazon (test)
R@5026.56
15
Reviewer AssignmentLR-Bench
Loss (LR-PC)0.2048
14
ClassificationAmazon Mathematics 8-shot (test)
Macro F123.37
14
ClassificationAmazon Economics 8-shot (test)
Macro F1 Score16.16
14
Showing 10 of 26 rows

Other info

Code

Follow for update