Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot Large Language Models for Automatic Readability Assessment

About

Unsupervised automatic readability assessment (ARA) methods have important practical and research applications (e.g., ensuring medical or educational materials are suitable for their target audiences). In this paper, we propose a new zero-shot prompting methodology for ARA and present the first comprehensive evaluation of using large language models (LLMs) as an unsupervised ARA method by testing 10 diverse open-source LLMs (e.g., different sizes and developers) on 14 diverse datasets (e.g., different text lengths and languages). Our findings show that our proposed prompting methodology outperforms prior methods on 13 of the 14 datasets. Furthermore, we propose LAURAE, which combines LLM and readability formula scores to improve robustness by capturing both contextual and shallow (e.g., sentence length) features of readability. Our evaluation demonstrates that LAURAE robustly outperforms prior methods across languages, text lengths, and amounts of technical language.

Riley Grossman, Yi Chen• 2026

Related benchmarks

TaskDatasetResultRank
Readability AssessmentGreek Lang.
Pearson Correlation0.43
4
Readability AssessmentGreek Hist.
Pearson Correlation0.572
4
Readability AssessmentVikidia fr
Pearson Correlation0.953
4
Readability AssessmentVikidia
Pearson Correlation0.9
4
Readability AssessmentASSET
Pearson Correlation0.629
4
Readability AssessmentCLEAR
Pearson Correlation0.735
4
Readability AssessmentOneStop
Pearson Correlation0.654
4
Readability AssessmentMedReadMe
Pearson Correlation Coefficient0.77
4
Readability AssessmentReadMe
Pearson Correlation0.798
4
Readability AssessmentReadMe fr
Pearson Correlation0.75
4
Showing 10 of 14 rows

Other info

Follow for update