Compressive Transformers for Long-Range Sequence Modelling
About
We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.
Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap• 2019
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-103 (test) | Perplexity15.8 | 524 | |
| Character-level Language Modeling | enwik8 (test) | BPC0.97 | 195 | |
| Language Modeling | WikiText-103 (val) | PPL16 | 180 | |
| Language Modeling | PG-19 (test) | Perplexity33.6 | 106 | |
| Language Modeling | PG-19 (val) | Perplexity43.4 | 19 | |
| Object Collision | Object Collision (test) | Test Error0.638 | 6 | |
| Document Grounded Dialogue | CMU-DoG | Perplexity (PPL)18.02 | 5 |
Showing 7 of 7 rows