Regularizing and Optimizing LSTM Language Models

About

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization. Further, we introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Using these and other regularization strategies, we achieve state-of-the-art word level perplexities on two data sets: 57.3 on Penn Treebank and 65.8 on WikiText-2. In exploring the effectiveness of a neural cache in conjunction with our proposed model, we achieve an even lower state-of-the-art perplexity of 52.8 on Penn Treebank and 52.0 on WikiText-2.

Stephen Merity, Nitish Shirish Keskar, Richard Socher• 2017

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2 (test)	PPL44.3	2333
Language Modeling	PTB (test)	Perplexity57.3	543
Language Modeling	WikiText2 (val)	Perplexity (PPL)46.4	423
Language Modeling	Penn Treebank (test)	Perplexity51.1	420
Language Modeling	WikiText2 v1 (test)	Perplexity52	383
Character-level Language Modeling	enwik8 (test)	BPC1.232	195
Language Modeling	Penn Treebank (val)	Perplexity51.6	178
Language Modeling	Penn Treebank (PTB) (test)	Perplexity51.1	130
Language Modeling	PTB (val)	Perplexity60	107
Language Modeling	Penn Treebank word-level (test)	Perplexity51.1	72

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord