Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning

About

Recent advances have demonstrated that integrating reinforcement learning with rule-based rewards can significantly enhance the reasoning capabilities of large language models, even without supervised fine-tuning. However, prevalent reinforcement learning algorithms such as GRPO and its variants like DAPO, suffer from a coarse granularity issue when computing the advantage. Specifically, they compute rollout-level advantages that assign identical values to every token within a sequence, failing to capture token-specific contributions and hindering effective learning. To address this limitation, we propose Key-token Advantage Estimation (KTAE) - a novel algorithm that estimates fine-grained, token-level advantages without introducing additional models. KTAE leverages the correctness of sampled rollouts and applies statistical analysis to quantify the importance of individual tokens within a sequence to the final outcome. This quantified token-level importance is then combined with the rollout-level advantage to obtain a more fine-grained token-level advantage estimation. Empirical results show that models trained with GRPO+KTAE and DAPO+KTAE outperform baseline methods across five mathematical reasoning benchmarks. Notably, they achieve higher accuracy with shorter responses and even surpass R1-Distill-Qwen-1.5B using the same base model.

Wei Sun, Wen Yang, Pu Jian, Qianlong Du, Fuwei Cui, Shuo Ren, Jiajun Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K--
351
Mathematical ReasoningMinerva--
138
Mathematical ReasoningAMC 23
Pass@165.1
46
Scientific ReasoningScience Domain In-Domain: SampleQA, GPQA(ALL), HLE
SampleQA Score3.17
18
Mathematical ReasoningMath MATH500, AIME24, Minerva-Math, AMC23
MATH500 Score82.2
18
Scientific ReasoningGPQA
Pass@1691.52
16
Mathematical ReasoningAMC 2023
Pass@1697.5
16
Mathematical ReasoningOlympiad
Pass@1686.8
16
Mathematical ReasoningAIME 2025
P@115.2
13
Mathematical ReasoningAIME 2024
P@133.3
13
Showing 10 of 15 rows

Other info

Follow for update