GRPO-$\lambda$: Credit Assignment improves LLM Reasoning

About

Large language models (LLMs) are increasingly deployed for tasks requiring complex reasoning, prompting significant interest in improving their reasoning abilities through post-training. Especially RL based methods using verifiable reward, like the state-of-the-art GRPO, have shown to tremendously improve reasoning behaviors when applied as post-training methods. However, the lack of an explicit reward or critic model limits GRPO's ability to assign fine-grained credit across token sequences. In this work, we present GRPO-$\lambda$, a novel extension to GRPO that enhances credit assignment in RL finetuning of LLMs for complex reasoning tasks. We approximate learning from $\lambda$-return with a reformulation of eligibility traces using token-level log-probabilities applied after each sequence generation, and a novel critic-free approximation of the temporal-difference error. We introduce a few variations for the weighting of the $\lambda$-return, and their applications to the eligibility-trace, where all the variations provide significant gains over GRPO. We compare GRPO-$\lambda$ against GRPO by training models from 1.5B to 7B parameters on $4$ different math reasoning datasets. The training plots demonstrate 30-40% improved performance during RL training on both LLaMA-3.1 and Qwen-2.5 architectures. Finally, we show that with GRPO-$\lambda$, the resulting average performance on AIME24, Math500, OlympiadMath, MinervaMath, and AMC improves over GRPO by over $3$ points and a $4.5$ points improvement on the 7B model.

Prasanna Parthasarathi, Mathieu Reymond, Boxing Chen, Yufei Cui, Sarath Chandar• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	Minerva	Avg@1630.52	42
Mathematical Reasoning	BeyondAIME	Pass@1627.61	39
Mathematical Reasoning	AMC23	Pass@k60	35
Mathematical Reasoning	AIME24	Pass@1638.83	30
Mathematical Reasoning	AIME 25	Pass@1638.62	22
Mathematical Reasoning	AIME 25	pass@1626.66	6
Mathematical Reasoning	MATH 500	pass@16 Success Rate90.11	3
Mathematical Reasoning	AIME 24	Pass@1648.24	3
Mathematical Reasoning	AMC 23	pass@1692.3	3

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord