Learning Cross-Context Entity Representations from Text

About

Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases or human readable encyclopedias tend to be entity-centric, we investigate the use of a fill-in-the-blank task to learn context independent representations of entities from the text contexts in which those entities were mentioned. We show that large scale training of neural models allows us to learn high quality entity representations, and we demonstrate successful results on four domains: (1) existing entity-level typing benchmarks, including a 64% error reduction over previous work on TypeNet (Murty et al., 2018); (2) a novel few-shot category reconstruction task; (3) existing entity linking benchmarks, where we match the state-of-the-art on CoNLL-Aida without linking-specific features and obtain a score of 89.8% on TAC-KBP 2010 without using any alias table, external knowledge base or in domain training data and (4) answering trivia questions, which uniquely identify entities. Our global entity representations encode fine-grained type categories, such as Scottish footballers, and can answer trivia questions such as: Who was the last inmate of Spandau jail in Berlin?

Jeffrey Ling, Nicholas FitzGerald, Zifei Shan, Livio Baldini Soares, Thibault F\'evry, David Weiss, Tom Kwiatkowski• 2020

Related benchmarks

Task	Dataset	Result
Entity Disambiguation	AIDA CoNLL (test)	In-KB Accuracy94.9	36
Entity Linking	TAC-KBP 2010 (test)	Accuracy89.8	16
Question Answering	TriviaQA Web domain Verified (test)	Exact Match (EM)51.2	11
Entity Linking	CoNLL-Aida (test)	Accuracy94.9	8
Entity Disambiguation	CoNLL table P (test)	Accuracy94.9	7
Entity-level fine typing	FIGMENT	F1 Score87.9	3
Entity-level fine typing	TypeNet (Full)	mAP90.1	2
Entity-level fine typing	TypeNet Low Data (5%) (train)	mAP85.3	2
Question Answering	TriviaQA Open-domain Unfiltered	Exact Match35.7	2

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord