Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Compressive Transformers for Long-Range Sequence Modelling

About

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap• 2019

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-103 (test)
Perplexity15.8
524
Character-level Language Modelingenwik8 (test)
BPC0.97
195
Language ModelingWikiText-103 (val)
PPL16
180
Language ModelingPG-19 (test)
Perplexity33.6
106
Language ModelingPG-19 (val)
Perplexity43.4
19
Object CollisionObject Collision (test)
Test Error0.638
6
Document Grounded DialogueCMU-DoG
Perplexity (PPL)18.02
5
Showing 7 of 7 rows

Other info

Follow for update