Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

About

Reinforcement Learning with Verifiable Rewards (RLVR) serves as a cornerstone technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, its training is often plagued by \emph{entropy collapse}, a rapid decline in policy entropy that limits exploration and undermines training effectiveness. While recent works attempt to mitigate this issue via several heuristic entropy interventions, the underlying mechanisms remain poorly understood. In this work, we conduct comprehensive theoretical and empirical analyses of entropy dynamics in RLVR, offering two main insights: (1) We derive a tight analytical approximation for token-level entropy change at each update step, revealing four governing factors and providing a unified theoretical framework to explain how existing methods influence entropy; (2) We reveal a fundamental limitation of recent approaches: they rely on heuristic adjustments to one or two of these factors, leaving other relevant factors unconsidered, thus inherently limiting their effectiveness. Motivated by these findings, we propose STEER, a principled entropy-modulation method that adaptively reweights tokens based on theoretically-estimated entropy variations. Extensive experiments across six mathematical reasoning and three coding benchmarks demonstrate that STEER effectively mitigates entropy collapse and consistently outperforms state-of-the-art baselines.

Zhezheng Hao, Hong Wang, Haoyang Liu, Jian Luo, Jiarui Yu, Hande Dong, Qiang Lin, Can Wang, Jiawei Chen• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMinerva
Accuracy (Acc)28.2
146
Mathematical ReasoningOlympiad
Accuracy0.366
134
Mathematical ReasoningMinerva Math
pass@1 Accuracy41.7
104
Code GenerationLCB v5
Accuracy31.8
45
Mathematical ReasoningAIME 24
Accuracy17.4
42
Mathematical ReasoningMATH500
Pass@182.2
40
Mathematical ReasoningAMC23
Accuracy61.6
38
Mathematical ReasoningAIME 25
Avg@3216.1
34
Mathematical ReasoningAMC 23
Avg@3272.1
31
Mathematical ReasoningOlympiadBench
Avg Score (avg@1)43
13
Showing 10 of 12 rows

Other info

Follow for update