Semi-Supervised Sequence Modeling with Cross-View Training

About

Unsupervised representation learning algorithms such as word2vec and ELMo improve the accuracy of many supervised NLP models, mainly because they can take advantage of large amounts of unlabeled text. However, the supervised models only learn from task-specific labeled data during the main training phase. We therefore propose Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data. On labeled examples, standard supervised learning is used. On unlabeled examples, CVT teaches auxiliary prediction modules that see restricted views of the input (e.g., only part of a sentence) to match the predictions of the full model seeing the whole input. Since the auxiliary modules and the full model share intermediate representations, this in turn improves the full model. Moreover, we show that CVT is particularly effective when combined with multi-task learning. We evaluate CVT on five sequence tagging tasks, machine translation, and dependency parsing, achieving state-of-the-art results.

Kevin Clark, Minh-Thang Luong, Christopher D. Manning, Quoc V. Le• 2018

Related benchmarks

Task	Dataset	Result
Named Entity Recognition	CoNLL 2003 (test)	F1 Score92.61	556
Named Entity Recognition	CoNLL English 2003 (test)	F1 Score92.61	135
Named Entity Recognition	OntoNotes	F1-score88.8	121
Named Entity Recognition	OntoNotes 5.0 (test)	F1 Score88.88	90
Chunking	CoNLL 2000 (test)	F1 Score97	88
Dependency Parsing	Penn Treebank (PTB) (test)	LAS95.02	80
Named Entity Recognition	NER (test)	F1 Score92.61	68
Image Classification	CIFAR-10 4,000 labels (test)	--	62
Slot Filling	ATIS (test)	F1 Score94.8	55
CCG Supertagging	CCGBank (test)	Accuracy96.1	35

Showing 10 of 26 rows

Other info

Code

Follow for update

@wizwand_team Discord