GRPO-$\lambda$: Credit Assignment improves LLM Reasoning
About
Large language models (LLMs) are increasingly deployed for tasks requiring complex reasoning, prompting significant interest in improving their reasoning abilities through post-training. Especially RL based methods using verifiable reward, like the state-of-the-art GRPO, have shown to tremendously improve reasoning behaviors when applied as post-training methods. However, the lack of an explicit reward or critic model limits GRPO's ability to assign fine-grained credit across token sequences. In this work, we present GRPO-$\lambda$, a novel extension to GRPO that enhances credit assignment in RL finetuning of LLMs for complex reasoning tasks. We approximate learning from $\lambda$-return with a reformulation of eligibility traces using token-level log-probabilities applied after each sequence generation, and a novel critic-free approximation of the temporal-difference error. We introduce a few variations for the weighting of the $\lambda$-return, and their applications to the eligibility-trace, where all the variations provide significant gains over GRPO. We compare GRPO-$\lambda$ against GRPO by training models from 1.5B to 7B parameters on $4$ different math reasoning datasets. The training plots demonstrate 30-40% improved performance during RL training on both LLaMA-3.1 and Qwen-2.5 architectures. Finally, we show that with GRPO-$\lambda$, the resulting average performance on AIME24, Math500, OlympiadMath, MinervaMath, and AMC improves over GRPO by over $3$ points and a $4.5$ points improvement on the 7B model.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | Minerva | Avg@1630.52 | 42 | |
| Mathematical Reasoning | AMC23 | Pass@k60 | 35 | |
| Mathematical Reasoning | AIME24 | Pass@1638.83 | 30 | |
| Mathematical Reasoning | AIME 25 | Pass@1638.62 | 22 | |
| Mathematical Reasoning | AIME 25 | pass@1626.66 | 6 | |
| Mathematical Reasoning | MATH 500 | pass@16 Success Rate90.11 | 3 | |
| Mathematical Reasoning | AIME 24 | Pass@1648.24 | 3 | |
| Mathematical Reasoning | AMC 23 | pass@1692.3 | 3 | |
| Mathematical Reasoning | BeyondAIME | Pass@1627.61 | 3 |