Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs

About

Existing automated essay scoring (AES) has solely relied on essay text without using explanatory rationales for the scores, thereby forgoing an opportunity to capture the specific aspects evaluated by rubric indicators in a fine-grained manner. This paper introduces Rationale-based Multiple Trait Scoring (RMTS), a novel approach for multi-trait essay scoring that integrates prompt-engineering-based large language models (LLMs) with a fine-tuning-based essay scoring model using a smaller large language model (S-LLM). RMTS uses an LLM-based trait-wise rationale generation system where a separate LLM agent generates trait-specific rationales based on rubric guidelines, which the scoring model uses to accurately predict multi-trait scores. Extensive experiments on benchmark datasets, including ASAP, ASAP++, and Feedback Prize, show that RMTS significantly outperforms state-of-the-art models and vanilla S-LLMs in trait-specific scoring. By assisting quantitative assessment with fine-grained qualitative rationales, RMTS enhances the trait-wise reliability, providing partial explanations about essays. The code is available at https://github.com/BBeeChu/RMTS.git.

SeongYeub Chu, JongWoo Kim, Bryan Wong, MunYong Yi• 2024

Related benchmarks

Task	Dataset	Result
Automated essay scoring	ASAP and ASAP++ (five-fold cross-validation)	Score P10.716	11
Trait-wise Automated Essay Scoring	ASAP and ASAP++ (five-fold cross-val)	Overall Score75.5	11
Automated essay scoring	ASAP++ full-data setting	Score P10.716	10
Multi-trait automated essay scoring	ASAP++ (full-data)	Overall Score0.755	10
Automated essay scoring	ASAP++ 32-data setting (test)	QWK (P1)0.479	6
Multi-trait automated essay scoring	ASAP++ 32-data setting (test)	Overall Score0.494	6
Automated essay scoring	ASAP	QWK0.726	5
Automated essay scoring	ASAP++	QWK0.708	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord