Language Model Pre-Training with Sparse Latent Typing

About

Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT.

Liliang Ren, Zixuan Zhang, Han Wang, Clare R. Voss, Chengxiang Zhai, Heng Ji• 2022

Related benchmarks

Task	Dataset	Result
Natural Language Understanding	GLUE (dev)	SST-2 (Acc)92.4	529
Named Entity Recognition	Few-NERD INTER 1.0 (test)	Average F159.62	62
Named Entity Recognition	FewNERD INTRA	--	47
Joint Information Extraction	ACE 2005 (test)	Entity F181.1	4
Joint Information Extraction	ERE (test)	Entity F187.13	4

Showing 5 of 5 rows

Other info

Code

Follow for update

@wizwand_team Discord