Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

About

Reinforcement learning significantly enhances LLM capabilities but suffers from a critical issue: length inflation, where models adopt verbosity or inefficient reasoning to maximize rewards. Prior approaches struggle to address this challenge in a general and lossless manner, primarily because additive penalties introduce a compensatory effect that creates optimization shortcuts, while heuristic gating strategies lack generality beyond binary feedback. To bridge this gap, we present Group Relative Reward Rescaling (GR$^3$), which reframes length control as a multiplicative rescaling paradigm, effectively establishing a generalized, continuous, and reward-dependent gating mechanism. To further ensure lossless optimization, we incorporate group-relative regularization and advantage-aware calibration, which dynamically adapt length budgets to instance difficulty and preserve the advantage signal of high-quality trajectories. Empirically, across both RLHF and RLVR settings, GR$^3$~maintains training dynamics and downstream performance comparable to standard GRPO while significantly mitigating length inflation, outperforming state-of-the-art length-regularized baselines.

Zichao Li, Jie Lou, Fangchen Dong, Zhiyuan Fan, Mengjie Ren, Hongyu Lin, Xianpei Han, Debing Zhang, Le Sun, Yaojie Lu, Xing Yu• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH500
Accuracy (Avg@4)89.3
10
Mathematical ReasoningAIME 24
Average Score (Top-32)45.2
7
Mathematical ReasoningAIME 25
Avg@32 Score32.8
7
Mathematical ReasoningAMC 23
Average Accuracy @1693
7
Mathematical ReasoningMATH500
Avg@4 Score94
7
Mathematical ReasoningAMC 23
Avg@16 Score81.6
7
Chat PerformanceArena-Hard-Auto
Score92.8
6
Chat PerformanceAlpaca Eval
Score55.8
6
Code GenerationLiveCodeBench v6
Score41.6
6
Showing 9 of 9 rows

Other info

Follow for update