Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

About

Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we use the candidate lists as weak supervision to constrain our document-level entity linking model. The model treats entities as latent variables and, when estimated on a collection of unlabelled texts, learns to choose entities relying both on local context of each mention and on coherence with other entities in the document. The resulting approach rivals fully-supervised state-of-the-art systems on standard test sets. It also approaches their performance in the very challenging setting: when tested on a test set sampled from the data used to estimate the supervised systems. By comparing to Wikipedia-only training of our model, we demonstrate that modeling unlabeled documents is beneficial.

Phong Le, Ivan Titov• 2019

Related benchmarks

TaskDatasetResultRank
Entity DisambiguationAIDA CoNLL (test)
In-KB Accuracy93.07
36
Entity LinkingMSNBC--
36
Named Entity DisambiguationAIDA (test)
Micro InKB F189.6
25
Named Entity DisambiguationMSNBC out-of-domain (test)
Micro F1 (InKB)92.2
18
Entity DisambiguationStandard Entity Disambiguation Datasets (AIDA, MSNBC, AQUAINT, ACE2004, CWEB, WIKI) InKB (test)
AIDA Score89.7
15
Named Entity DisambiguationAQUAINT out-of-domain (test)
Micro F1 (InKB)90.7
13
Named Entity DisambiguationCWEB out-of-domain (test)
Micro F1 (InKB)78.2
13
Named Entity DisambiguationWIKI out-of-domain (test)
Micro F1 (InKB)81.7
13
Named Entity DisambiguationACE out-of-domain 2004 (test)
Micro F1 (InKB)88.1
13
Entity LinkingAQUAINT
F190.7
8
Showing 10 of 14 rows

Other info

Code

Follow for update