Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WikiCREM: A Large Unsupervised Corpus for Coreference Resolution

About

Pronoun resolution is a major area of natural language understanding. However, large-scale training sets are still scarce, since manually labelling data is costly. In this work, we introduce WikiCREM (Wikipedia CoREferences Masked) a large-scale, yet accurate dataset of pronoun disambiguation instances. We use a language-model-based approach for pronoun resolution in combination with our WikiCREM dataset. We compare a series of models on a collection of diverse and challenging coreference resolution problems, where we match or outperform previous state-of-the-art approaches on 6 out of 7 datasets, such as GAP, DPR, WNLI, PDP, WinoBias, and WinoGender. We release our model to be used off-the-shelf for solving pronoun disambiguation.

Vid Kocijan, Oana-Maria Camburu, Ana-Maria Cretu, Yordan Yordanov, Phil Blunsom, Thomas Lukasiewicz• 2019

Related benchmarks

TaskDatasetResultRank
Coreference ResolutionGAP (test)
Overall F178
53
Pronoun ResolutionWinoGrande
Accuracy64.9
35
Coreference ResolutionWinograd WSC273 (test)
Accuracy83.2
34
Pronoun DisambiguationWinograd Schema Challenge
Accuracy71.8
27
Pronoun ResolutionDPR
Accuracy0.8
14
Coreference ResolutionWinogender (WG) (test)
Accuracy77.1
11
Pronoun ResolutionKnowRef
Accuracy65
8
Coreference ResolutionDPR (test)
Accuracy90.6
7
Coreference ResolutionPDP (test)
Accuracy93.3
7
Showing 9 of 9 rows

Other info

Follow for update