Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy

About

With the rapid advancement of Large Language Models (LLMs), the Chain-of-Thought (CoT) component has become significant for complex reasoning tasks. However, in conventional Supervised Fine-Tuning (SFT), the model could allocate disproportionately more attention to CoT sequences with excessive length. This reduces focus on the much shorter but essential Key portion-the final answer, whose correctness directly determines task success and evaluation quality. To address this limitation, we propose SFTKey, a two-stage training scheme. In the first stage, conventional SFT is applied to ensure proper output format, while in the second stage, only the Key portion is fine-tuned to improve accuracy. Extensive experiments across multiple benchmarks and model families demonstrate that SFTKey achieves an average accuracy improvement exceeding 5\% over conventional SFT, while preserving the ability to generate correct formats. Overall, this study advances LLM fine-tuning by explicitly balancing CoT learning with additional optimization on answer-relevant tokens.

Xiaofeng Shi, Qian Kou, Yuduo Li, Hua Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Composite Score88.16
20
Mathematical ReasoningOpenR1 Math 220k
Composite Score0.8633
20
Question AnsweringOpenBookQA
Composite Score92.14
20
Chain-of-Thought ReasoningCoT-Collection
Composite Score73.42
20
Showing 4 of 4 rows

Other info

Follow for update