Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification
About
Randomly masking text spans in ordinary texts in the pre-training stage hardly allows models to acquire the ability to generate simple texts. It can hurt the performance of pre-trained models on text simplification tasks. In this paper, we propose a new continued pre-training strategy to teach the pre-trained model to generate simple texts. We continue pre-training BART, a representative model, to obtain SimpleBART. It consistently and significantly improves the results on lexical simplification, sentence simplification, and document-level simplification tasks over BART. At the end, we compare SimpleBART with several representative large language models (LLMs).
Renliang Sun, Wei Xu, Xiaojun Wan• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sentence Simplification | Newsela (test) | SARI41.6 | 61 | |
| Sentence Simplification | TurkCorpus English (test) | SARI39.5 | 41 | |
| Lexical Simplification | LexMTurk (test) | F1 Score28.5 | 7 | |
| Sentence Simplification | Human Evaluation 100-sentence sample (test) | Simplicity3.62 | 7 | |
| Lexical Simplification | BenchLS (test) | F1 Score27.8 | 7 | |
| Document Simplification | D-Wikipedia (test) | D-SARI41.64 | 4 | |
| Document-level Text Simplification | D-Wikipedia (test) | D-SARI41.64 | 4 |
Showing 7 of 7 rows