Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
About
When trained effectively, the Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language. In this paper, we propose the first large-scale language VAE model, Optimus. A universal latent embedding space for sentences is first pre-trained on large text corpus, and then fine-tuned for various language generation and understanding tasks. Compared with GPT-2, Optimus enables guided language generation from an abstract level using the latent vectors. Compared with BERT, Optimus can generalize better on low-resource language understanding tasks due to the smooth latent space structure. Extensive experimental results on a wide range of language tasks demonstrate the effectiveness of Optimus. It achieves new state-of-the-art on VAE language modeling benchmarks. We hope that our first pre-trained big VAE language model itself and results can help the NLP community renew the interests of deep generative models in the era of large-scale pre-training, and make these principled methods more practical.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | Penn Treebank (PTB) (test) | Perplexity23.58 | 120 | |
| Language Modeling | Yahoo (test) | -- | 48 | |
| Language Modeling | Yelp (test) | PPL21.99 | 35 | |
| Mathematical Reasoning | Mathematics out-of-domain (test) | Accuracy2 | 26 | |
| Conclusion Generation | EntailmentBank (test) | BLEU26 | 26 | |
| Sentence Interpolation Smoothness | ARGO randomly sampled 200 sentence pairs | Average IS0.259 | 22 | |
| Autoencoding | Mathematical expressions EVAL (test) | BLEU96 | 22 | |
| Language modelling | Mathematical expression EVAL (test) | Exact Match99 | 19 | |
| Language modelling | Explanatory sentences | BLEU35 | 19 | |
| Disentanglement | ARG0 | Accuracy97.2 | 18 |