Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Trait-Aware Policy Optimization for Autoregressive Multi-Trait Essay Scoring

About

Multi-trait essay scoring aims to provide fine-grained evaluation of writing quality across multiple dimensions. However, how to effectively post-train autoregressive scoring models remains underexplored. In this paper, we propose Trait-Aware Policy Optimization (TAPO), a post-training framework tailored to autoregressive multi-trait scoring. Our method decomposes rewards along both the sample and trait dimensions, combining global scoring consistency, trait-level accuracy, format validity, and inter-trait dependency preservation. In addition, we use enhanced prompts throughout training by incorporating original prompt texts and trait descriptions, providing richer semantic information for trait-specific score generation. Experiments across multiple backbone models show that our method consistently improves multi-trait scoring performance over supervised fine-tuning and scalar-reward optimization baselines, demonstrating the effectiveness and transferability of trait-aware post-training for essay scoring.

Zhengyang Wang, Sanwoo Lee, Jiaxin Wang, Chenxi Miao, Weikang Li, Yunfang Wu• 2026

Related benchmarks

TaskDatasetResultRank
Automated essay scoringASAP and ASAP++ (five-fold cross-validation)
Score P10.73
11
Trait-wise Automated Essay ScoringASAP and ASAP++ (five-fold cross-val)
Overall Score77.7
11
Automated essay scoringASAP
QWK0.743
5
Automated essay scoringASAP++
QWK0.726
5
Automated essay scoringFeedback Prize (test)
QWK (Cohesion)0.603
4
Showing 5 of 5 rows

Other info

Follow for update