Linear Dynamics in the RLVR Training of Large Language Models
About
Reinforcement learning with verifiable rewards (RLVR) has driven significant performance gains in reasoning-oriented large language models (LLMs), yet its internal training dynamics remain largely a black box. In this work, we perform a comprehensive trajectory-level analysis of RLVR and uncover a striking regularity: across various model families, RL algorithms, and training configurations, RLVR consistently enters a robust linear regime, where both parameter weights and output log-probabilities, measured rigorously via teacher-forced evaluation, evolve in a highly linear manner ($R^2 > 0.7$). Through controlled experiments and theoretical analysis, we demonstrate that this linearity is not a coincidence, but stems from the high-variance, noisy nature of RLVR training signals, which act as a low-pass filter to concentrate optimization along a stable, low-dimensional drift. Moreover, we show that this linear structure is not merely descriptive but powerfully predictive and actionable. Specifically, weight-space extrapolation matches the performance of standard RL optimization while achieving a 6.1x training speedup through periodic re-grounding. Meanwhile, output-space extrapolation serves as a lightweight intervention that effectively bypasses late-stage model collapse, consistently outperforming standard RL across mathematical and coding benchmarks, with an average performance improvement of 4.2%. Our code is available at https://github.com/Miaow-Lab/RLVR-Linearity.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | AIME 2024 | Accuracy14.6 | 479 | |
| Mathematical Reasoning | Minerva | Accuracy (Acc)28.6 | 146 | |
| Mathematical Reasoning | AIME 26 | Accuracy23.3 | 41 | |
| Mathematical Reasoning | AMC23 | Accuracy68.4 | 38 | |
| Mathematical Reasoning | OlympBench | Pass@148.5 | 29 | |
| Mathematical Reasoning | HMMT25 | Accuracy (HMMT25)16.3 | 21 | |
| Multiple-choice Question Answering | MMLU-Pro | Biology Accuracy82.8 | 20 | |
| Multi-task Language Understanding | MMLU Pro (test) | History Score62.8 | 20 | |
| Mathematical Reasoning | Mathematical Reasoning Tasks AMC23 Minerva | AMC23 Score39.4 | 16 | |
| Mathematical Reasoning | OlymMATH | Accuracy7.8 | 16 |