Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Improving Lexical Difficulty Prediction with Context-Aligned Contrastive Learning and Ridge Ensembling

About

Lexical difficulty prediction is a fundamental problem in language learning and readability assessment, requiring models to estimate word difficulty across different first-language (L1) backgrounds. However, existing approaches rely on regression-only training with scalar supervision, which does not explicitly structure the representation space, limiting their ability to capture cross-lingual alignment and ordinal difficulty. To mitigate these issues, we propose Context-Aligned Contrastive Regression, which integrates Ridge regression ensemble with two complementary objectives, i.e., Cross-View Context and Ordinal Soft Contrastive Learning. Experiments on three L1 datasets show that (i) contrastive objectives improve cross-lingual representation alignment while preserving language-specific nuances, (ii) the learned representations capture the ordinal structure of lexical difficulty, and (iii) the ensemble effectively mitigates systematic biases of individual models, leading to more stable performance across difficulty levels.

Wicaksono Leksono Muhamad, Joanito Agili Lopo, Tsamarah Rana Nugraha, Ahmad Cahyono Adi, Muhammad Oriza Nurfajri• 2026

Related benchmarks

TaskDatasetResultRank
Lexical difficulty predictionBEA German Closed Track 2026
RMSE0.997
4
Lexical difficulty predictionBEA Chinese Closed Track 2026
RMSE0.88
4
Lexical difficulty predictionBEA Spanish Closed Track 2026
RMSE1.063
4
Showing 3 of 3 rows

Other info

Follow for update