Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR
About
Reinforcement Learning with Verifiable Rewards (RLVR) has become a key approach for improving the reasoning abilities of large language models. However, widely used critic-free algorithms such as Group Relative Policy Optimization (GRPO) necessitate a ``uniform credit assignment'' assumption that indiscriminately broadcast trajectory-level advantages, hindering learning efficiency by failing to distinguish critical reasoning steps. To address this limitation, we propose Selective Eligibility Traces (S-trace). Grounded in the intuition of partial trust region preservation, we initially introduce P-trace as a sample-efficient, critic-free eligibility traces method, upon which we build S-trace, implementing a sparse eligibility traces mechanism to further mitigate variance and achieve fine-grained credit assignment by selectively masking low-entropy tokens. Theoretically, we contextualize the recent Group Sequence Policy Optimization (GSPO) method within the critic-free eligibility traces framework, identifying it as a special instance of the eligibility traces method operating under uniform credit assignment. Experiments demonstrate that S-trace not only outperforms GRPO, showing gains of 0.49\% on Qwen3-1.7B and 3.16\% on Qwen3-4B, and maintaining a robust 2.98\% improvement when scaled further to Qwen3-8B in average pass@16, but notably achieves this with simultaneously higher sample and token efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | Minerva | Avg@1633.28 | 42 | |
| Mathematical Reasoning | AMC23 | Pass@k57.5 | 35 | |
| Mathematical Reasoning | AIME24 | Pass@1630 | 30 | |
| Mathematical Reasoning | AIME 25 | Pass@1642.07 | 22 | |
| Mathematical Reasoning | AIME 25 | pass@1636.2 | 6 | |
| Mathematical Reasoning | MATH 500 | pass@16 Success Rate90.49 | 3 | |
| Mathematical Reasoning | AIME 24 | Pass@1654.82 | 3 | |
| Mathematical Reasoning | BeyondAIME | Pass@1627.84 | 3 | |
| Mathematical Reasoning | AMC 23 | pass@1690.62 | 3 |