Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy

About

With the rapid advancement of Large Language Models (LLMs), the Chain-of-Thought (CoT) component has become significant for complex reasoning tasks. However, in conventional Supervised Fine-Tuning (SFT), the model could allocate disproportionately more attention to CoT sequences with excessive length. This reduces focus on the much shorter but essential Key portion-the final answer, whose correctness directly determines task success and evaluation quality. To address this limitation, we propose SFTKey, a two-stage training scheme. In the first stage, conventional SFT is applied to ensure proper output format, while in the second stage, only the Key portion is fine-tuned to improve accuracy. Extensive experiments across multiple benchmarks and model families demonstrate that SFTKey achieves an average accuracy improvement exceeding 5\% over conventional SFT, while preserving the ability to generate correct formats. Overall, this study advances LLM fine-tuning by explicitly balancing CoT learning with additional optimization on answer-relevant tokens.

Xiaofeng Shi, Qian Kou, Yuduo Li, Hua Zhou• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Composite Score88.16	20
Mathematical Reasoning	OpenR1 Math 220k	Composite Score0.8633	20
Question Answering	OpenBookQA	Composite Score92.14	20
Chain-of-Thought Reasoning	CoT-Collection	Composite Score73.42	20
Earth Science Reasoning	EarthSE (test)	Initial Score48.8	13
Question Answering	TRQA (test)	Initial Score30.81	13
Scientific Reasoning	ChemBench (test)	Initial Score59.5	13

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord