Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Measuring and Modifying the Readability of English Texts with GPT-4

About

The success of Large Language Models (LLMs) in other domains has raised the question of whether LLMs can reliably assess and manipulate the readability of text. We approach this question empirically. First, using a published corpus of 4,724 English text excerpts, we find that readability estimates produced ``zero-shot'' from GPT-4 Turbo and GPT-4o mini exhibit relatively high correlation with human judgments (r = 0.76 and r = 0.74, respectively), out-performing estimates derived from traditional readability formulas and various psycholinguistic indices. Then, in a pre-registered human experiment (N = 59), we ask whether Turbo can reliably make text easier or harder to read. We find evidence to support this hypothesis, though considerable variance in human judgments remains unexplained. We conclude by discussing the limitations of this approach, including limited scope, as well as the validity of the ``readability'' construct and its dependence on context, audience, and goal.

Sean Trott, Pamela D. Rivi\`ere (1) __INSTITUTION_2__ Department of Cognitive Science, University of California San Diego)• 2024

Related benchmarks

TaskDatasetResultRank
Readability AssessmentCambridge
Pearson Correlation0.888
4
Readability AssessmentGreek Lang.
Pearson Correlation0.427
4
Readability AssessmentGreek Hist.
Pearson Correlation0.52
4
Readability AssessmentCLEAR
Pearson Correlation0.725
4
Readability AssessmentMedReadMe
Pearson Correlation Coefficient0.736
4
Readability AssessmentReadMe
Pearson Correlation0.776
4
Readability AssessmentReadMe ar
Pearson Correlation0.523
4
Readability AssessmentVikidia fr
Pearson Correlation0.76
4
Readability AssessmentVikidia
Pearson Correlation0.827
4
Readability AssessmentASSET
Pearson Correlation0.324
4
Showing 10 of 14 rows

Other info

Follow for update