Generative Pretraining for Paraphrase Evaluation
About
We introduce ParaBLEU, a paraphrase representation learning model and evaluation metric for text generation. Unlike previous approaches, ParaBLEU learns to understand paraphrasis using generative conditioning as a pretraining objective. ParaBLEU correlates more strongly with human judgements than existing metrics, obtaining new state-of-the-art results on the 2017 WMT Metrics Shared Task. We show that our model is robust to data scarcity, exceeding previous state-of-the-art performance using only $50\%$ of the available training data and surpassing BLEU, ROUGE and METEOR with only $40$ labelled examples. Finally, we demonstrate that ParaBLEU can be used to conditionally generate novel paraphrases from a single demonstration, which we use to confirm our hypothesis that it learns abstract, generalized paraphrase representations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Paraphrase Detection | Microsoft Paraphrase Corpus | Accuracy88.8 | 21 | |
| Machine Translation Evaluation | WMT17 (test) | Kendall Tau0.653 | 12 |