TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge
About
The LLM-as-a-judge paradigm uses large language models (LLMs) for automated text evaluation, where a numerical assessment is assigned by an LLM to the input text following scoring rubrics. Existing methods for LLM-as-a-judge use cross-entropy (CE) loss for fine-tuning, which neglects the numeric nature of score prediction. Recent work addresses numerical prediction limitations of LLM fine-tuning through regression-aware fine-tuning, which, however, does not consider chain-of-thought (CoT) reasoning for score prediction. In this paper, we introduce TRACT (Two-stage Regression-Aware fine-tuning with CoT), a method combining CoT reasoning with regression-aware training. TRACT consists of two stages: first, seed LLM is fine-tuned to generate CoTs, which serve as supervision for the second stage fine-tuning. The training objective of TRACT combines the CE loss for learning the CoT reasoning capabilities, and the regression-aware loss for the score prediction. Experiments across four LLM-as-a-judge datasets and two LLMs show that TRACT significantly outperforms existing methods. Extensive ablation studies validate the importance of each component in TRACT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reward Modeling | RewardBench v1.0 (test) | Chat Score0.927 | 27 | |
| LLM-as-a-judge evaluation | FLASK | Pearson's r0.518 | 16 | |
| LLM-as-a-judge evaluation | Vicuna-bench | Pearson Correlation (r)0.605 | 16 | |
| LLM-as-a-judge evaluation | MT-Bench | Pearson's r0.672 | 16 | |
| LLM-as-a-judge evaluation | FB Bench (Feedback Bench) | Pearson's r0.931 | 16 | |
| Feedback Evaluation Alignment | MT-Bench | Kendall's Tau0.494 | 11 | |
| Feedback Evaluation Alignment | Vicuna-bench | Kendall's Tau0.423 | 6 | |
| Feedback Evaluation Alignment | Feedback Bench | Kendall's Tau82 | 6 | |
| Feedback Evaluation Alignment | FLASK | Kendall's Tau0.373 | 6 | |
| Feedback Evaluation | Feedback Bench (test) | Kendall's Tau0.805 | 5 |