Zero-shot Large Language Models for Automatic Readability Assessment

About

Unsupervised automatic readability assessment (ARA) methods have important practical and research applications (e.g., ensuring medical or educational materials are suitable for their target audiences). In this paper, we propose a new zero-shot prompting methodology for ARA and present the first comprehensive evaluation of using large language models (LLMs) as an unsupervised ARA method by testing 10 diverse open-source LLMs (e.g., different sizes and developers) on 14 diverse datasets (e.g., different text lengths and languages). Our findings show that our proposed prompting methodology outperforms prior methods on 13 of the 14 datasets. Furthermore, we propose LAURAE, which combines LLM and readability formula scores to improve robustness by capturing both contextual and shallow (e.g., sentence length) features of readability. Our evaluation demonstrates that LAURAE robustly outperforms prior methods across languages, text lengths, and amounts of technical language.

Riley Grossman, Yi Chen• 2026

Related benchmarks

Task	Dataset	Result
Readability Assessment	Greek Lang.	Pearson Correlation0.43	4
Readability Assessment	Greek Hist.	Pearson Correlation0.572	4
Readability Assessment	Vikidia fr	Pearson Correlation0.953	4
Readability Assessment	Vikidia	Pearson Correlation0.9	4
Readability Assessment	ASSET	Pearson Correlation0.629	4
Readability Assessment	CLEAR	Pearson Correlation0.735	4
Readability Assessment	OneStop	Pearson Correlation0.654	4
Readability Assessment	MedReadMe	Pearson Correlation Coefficient0.77	4
Readability Assessment	ReadMe	Pearson Correlation0.798	4
Readability Assessment	ReadMe fr	Pearson Correlation0.75	4

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord