Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning

About

Data quality plays a critical role in enhancing supervised fine-tuning (SFT) for large language models (LLMs), and token-level data selection has emerged as a promising direction for its fine-grained nature. Despite their strong empirical performance, existing token-level selection methods share two key limitations: (1) requiring training or accessing an additional reference model, and (2) relying solely on loss information for token selection, which cannot well preserve semantically important tokens that are not favored by loss-based metrics. To address these challenges, we propose ssToken, a Self-modulated and Semantic-aware Token Selection approach. ssToken leverages readily accessible history models to compute the per-token loss difference with the current model, which serves as a self-modulated signal that enables the model to adaptively select tokens along its optimization trajectory, rather than relying on excess loss from an offline-trained reference model as in prior works. We further introduce a semantic-aware, attention-based token importance estimation metric, orthogonal to loss-based selection and providing complementary semantic information for more effective filtering. Extensive experiments across different model families and scales demonstrate that both self-modulated selection and semantic-aware selection alone outperform full-data fine-tuning, while their integration--ssToken--achieves synergistic gains and further surpasses prior token-level selection methods, delivering performance improvements while maintaining training efficiency.

Xiaohan Qin, Xiaoxing Wang, Ning Liao, Cancheng Zhang, Xiangdong Zhang, Mingquan Feng, Jingzhi Wang, Junchi Yan• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
HellaSwag Accuracy54.66
711
Code GenerationHumanEval
HumanEval Score75.21
128
General Capability EvaluationGeneral Capability Suite MMLU, GSM8K, HumanEval, IFEval
Common Average Score70.08
39
General Capability EvaluationGeneral Capability Suite ARC-C, HellaSwag, MMLU, GSM8K
ARC-C Accuracy51.39
27
Science Question AnsweringARC-C
Accuracy (ARC-C)49.74
25
Multi-task Language UnderstandingMMLU
Accuracy56.62
15
Code GenerationHumanEval
Accuracy31.16
9
Mathematical ReasoningGSM8K
Accuracy63.6
9
Showing 8 of 8 rows

Other info

Follow for update