Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

On Predictability of Reinforcement Learning Dynamics for Large Language Models

About

Recent advances in reasoning capabilities of large language models (LLMs) are largely driven by reinforcement learning (RL), yet the underlying parameter dynamics during RL training remain poorly understood. This work identifies two fundamental properties of RL-induced parameter updates in LLMs: (1) Rank-1 Dominance, where the top singular subspace of the parameter update matrix nearly fully determines reasoning improvements, recovering over 99\% of performance gains; and (2) Rank-1 Linear Dynamics, where this dominant subspace evolves linearly throughout training, enabling accurate prediction from early checkpoints. Extensive experiments across 8 LLMs and 7 algorithms validate the generalizability of these properties. More importantly, based on these findings, we propose AlphaRL, a plug-in acceleration framework that extrapolates the final parameter update using a short early training window, achieving up to 2.5 speedup while retaining \textgreater 96\% of reasoning performance without extra modules or hyperparameter tuning. This positions our finding as a versatile and practical tool for large-scale RL, opening a path toward principled, interpretable, and efficient training paradigm for LLMs.

Yuchen Cai, Ding Cao, Xin Xu, Zijun Yao, Yuqing Huang, Zhenyu Tan, Benyi Zhang, Guangzhong Sun, Guiquan Liu, Junfeng Fang• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2024
Accuracy40
370
Science ReasoningGPQA
Accuracy57.75
243
Mathematical ReasoningAIME 2024
Accuracy15.4
104
Mathematical ReasoningAIME 2025
Acc31.8
81
Mathematical ReasoningMinerva
Accuracy (Acc)30.2
62
ReasoningGPQA
Accuracy49.25
57
Mathematical ReasoningMathematical Reasoning Suite (AIME24, AIME25, MATH, MINERVA, GPQA, GSM8K) Standard (test)
AIME24 Score68.25
26
Multiple-choice Question AnsweringMMLU-Pro
Biology Accuracy81.1
20
Multi-task Language UnderstandingMMLU Pro (test)
History Score61.1
20
Mathematical ReasoningMathematical Reasoning Tasks AMC23 Minerva
AMC23 Score41.3
16
Showing 10 of 15 rows

Other info

Follow for update