Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GRPO-$\lambda$: Credit Assignment improves LLM Reasoning

About

Large language models (LLMs) are increasingly deployed for tasks requiring complex reasoning, prompting significant interest in improving their reasoning abilities through post-training. Especially RL based methods using verifiable reward, like the state-of-the-art GRPO, have shown to tremendously improve reasoning behaviors when applied as post-training methods. However, the lack of an explicit reward or critic model limits GRPO's ability to assign fine-grained credit across token sequences. In this work, we present GRPO-$\lambda$, a novel extension to GRPO that enhances credit assignment in RL finetuning of LLMs for complex reasoning tasks. We approximate learning from $\lambda$-return with a reformulation of eligibility traces using token-level log-probabilities applied after each sequence generation, and a novel critic-free approximation of the temporal-difference error. We introduce a few variations for the weighting of the $\lambda$-return, and their applications to the eligibility-trace, where all the variations provide significant gains over GRPO. We compare GRPO-$\lambda$ against GRPO by training models from 1.5B to 7B parameters on $4$ different math reasoning datasets. The training plots demonstrate 30-40% improved performance during RL training on both LLaMA-3.1 and Qwen-2.5 architectures. Finally, we show that with GRPO-$\lambda$, the resulting average performance on AIME24, Math500, OlympiadMath, MinervaMath, and AMC improves over GRPO by over $3$ points and a $4.5$ points improvement on the 7B model.

Prasanna Parthasarathi, Mathieu Reymond, Boxing Chen, Yufei Cui, Sarath Chandar• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMinerva
Avg@1630.52
42
Mathematical ReasoningAMC23
Pass@k60
35
Mathematical ReasoningAIME24
Pass@1638.83
30
Mathematical ReasoningAIME 25
Pass@1638.62
22
Mathematical ReasoningAIME 25
pass@1626.66
6
Mathematical ReasoningMATH 500
pass@16 Success Rate90.11
3
Mathematical ReasoningAIME 24
Pass@1648.24
3
Mathematical ReasoningAMC 23
pass@1692.3
3
Mathematical ReasoningBeyondAIME
Pass@1627.61
3
Showing 9 of 9 rows

Other info

Follow for update