Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Linear Dynamics in the RLVR Training of Large Language Models

About

Reinforcement learning with verifiable rewards (RLVR) has driven significant performance gains in reasoning-oriented large language models (LLMs), yet its internal training dynamics remain largely a black box. In this work, we perform a comprehensive trajectory-level analysis of RLVR and uncover a striking regularity: across various model families, RL algorithms, and training configurations, RLVR consistently enters a robust linear regime, where both parameter weights and output log-probabilities, measured rigorously via teacher-forced evaluation, evolve in a highly linear manner ($R^2 > 0.7$). Through controlled experiments and theoretical analysis, we demonstrate that this linearity is not a coincidence, but stems from the high-variance, noisy nature of RLVR training signals, which act as a low-pass filter to concentrate optimization along a stable, low-dimensional drift. Moreover, we show that this linear structure is not merely descriptive but powerfully predictive and actionable. Specifically, weight-space extrapolation matches the performance of standard RL optimization while achieving a 6.1x training speedup through periodic re-grounding. Meanwhile, output-space extrapolation serves as a lightweight intervention that effectively bypasses late-stage model collapse, consistently outperforming standard RL across mathematical and coding benchmarks, with an average performance improvement of 4.2%. Our code is available at https://github.com/Miaow-Lab/RLVR-Linearity.

Tianle Wang, Jiayu Liu, Zhongyuan Wu, Shenghao Jin, Wei Chen, Hao Xu, Ning Miao• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2024
Accuracy14.6
479
Mathematical ReasoningMinerva
Accuracy (Acc)28.6
146
Mathematical ReasoningAIME 26
Accuracy23.3
41
Mathematical ReasoningAMC23
Accuracy68.4
38
Mathematical ReasoningOlympBench
Pass@148.5
29
Mathematical ReasoningHMMT25
Accuracy (HMMT25)16.3
21
Multiple-choice Question AnsweringMMLU-Pro
Biology Accuracy82.8
20
Multi-task Language UnderstandingMMLU Pro (test)
History Score62.8
20
Mathematical ReasoningMathematical Reasoning Tasks AMC23 Minerva
AMC23 Score39.4
16
Mathematical ReasoningOlymMATH
Accuracy7.8
16
Showing 10 of 11 rows

Other info

Follow for update