Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features

About

We report two essential improvements in readability assessment: 1. three novel features in advanced semantics and 2. the timely evidence that traditional ML models (e.g. Random Forest, using handcrafted features) can combine with transformers (e.g. RoBERTa) to augment model performance. First, we explore suitable transformers and traditional ML models. Then, we extract 255 handcrafted linguistic features using self-developed extraction software. Finally, we assemble those to create several hybrid models, achieving state-of-the-art (SOTA) accuracy on popular datasets in readability assessment. The use of handcrafted features help model performance on smaller datasets. Notably, our RoBERTA-RF-T1 hybrid achieves the near-perfect classification accuracy of 99%, a 20.3% increase from the previous SOTA.

Bruce W. Lee, Yoo Sung Jang, Jason Hyung-Jong Lee• 2021

Related benchmarks

Task	Dataset	Result
Readability Classification	WeeBit (test)	Accuracy90.5	13
Readability Assessment	OneStopE (test)	Accuracy99	6
Readability Assessment	OneStopE	Accuracy99	6
Readability Assessment	WeeBit	Accuracy90.5	6
Readability Assessment	Cambridge (test)	Accuracy76.3	5
Readability Assessment	Cambridge	Accuracy76.3	5

Showing 6 of 6 rows

Other info

Code

Follow for update

@wizwand_team Discord