Linear Dynamics in the RLVR Training of Large Language Models

About

Reinforcement learning with verifiable rewards (RLVR) has driven significant performance gains in reasoning-oriented large language models (LLMs), yet its internal training dynamics remain largely a black box. In this work, we perform a comprehensive trajectory-level analysis of RLVR and uncover a striking regularity: across various model families, RL algorithms, and training configurations, RLVR consistently enters a robust linear regime, where both parameter weights and output log-probabilities, measured rigorously via teacher-forced evaluation, evolve in a highly linear manner ($R^2 > 0.7$). Through controlled experiments and theoretical analysis, we demonstrate that this linearity is not a coincidence, but stems from the high-variance, noisy nature of RLVR training signals, which act as a low-pass filter to concentrate optimization along a stable, low-dimensional drift. Moreover, we show that this linear structure is not merely descriptive but powerfully predictive and actionable. Specifically, weight-space extrapolation matches the performance of standard RL optimization while achieving a 6.1x training speedup through periodic re-grounding. Meanwhile, output-space extrapolation serves as a lightweight intervention that effectively bypasses late-stage model collapse, consistently outperforming standard RL across mathematical and coding benchmarks, with an average performance improvement of 4.2%. Our code is available at https://github.com/Miaow-Lab/RLVR-Linearity.

Tianle Wang, Jiayu Liu, Zhongyuan Wu, Shenghao Jin, Wei Chen, Hao Xu, Ning Miao• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 2024	Accuracy14.6	479
Mathematical Reasoning	Minerva	Accuracy (Acc)28.6	146
Mathematical Reasoning	AIME 26	Accuracy23.3	41
Mathematical Reasoning	AMC23	Accuracy68.4	38
Mathematical Reasoning	OlympBench	Pass@148.5	29
Mathematical Reasoning	HMMT25	Accuracy (HMMT25)16.3	21
Multiple-choice Question Answering	MMLU-Pro	Biology Accuracy82.8	20
Multi-task Language Understanding	MMLU Pro (test)	History Score62.8	20
Mathematical Reasoning	Mathematical Reasoning Tasks AMC23 Minerva	AMC23 Score39.4	16
Mathematical Reasoning	OlymMATH	Accuracy7.8	16

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord