Representation Learning of Entities and Documents from Knowledge Base Descriptions

About

In this paper, we describe TextEnt, a neural network model that learns distributed representations of entities and documents directly from a knowledge base (KB). Given a document in a KB consisting of words and entity annotations, we train our model to predict the entity that the document describes and map the document and its target entity close to each other in a continuous vector space. Our model is trained using a large number of documents extracted from Wikipedia. The performance of the proposed model is evaluated using two tasks, namely fine-grained entity typing and multiclass text classification. The results demonstrate that our model achieves state-of-the-art performance on both tasks. The code and the trained representations are made available online for further academic research.

Ikuya Yamada, Hiroyuki Shindo, Yoshiyasu Takefuji• 2018

Related benchmarks

Task	Dataset	Result
Text Classification	20News	Accuracy84.5	143
Text Classification	R8	Accuracy96.7	91
Text Classification	R8 (test)	Accuracy96.7	56
Text Classification	20 Newsgroups by-date (test)	Accuracy84.5	12
Entity Typing	FIGMENT 102 types (test)	P@193.2	8

Showing 5 of 5 rows

Other info

Code

Follow for update

@wizwand_team Discord