ReLaX: Reasoning with Latent Exploration for Large Reasoning Models

About

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated remarkable potential in enhancing the reasoning capability of Large Reasoning Models (LRMs). However, RLVR often drives the policy toward over-determinism, resulting in ineffective exploration and premature policy convergence. While promoting token-level diversity has shown promise in mitigating entropy collapse, we argue that the latent dynamics underlying token generation encode a far richer computational structure for steering policy optimization toward a more effective exploration-exploitation tradeoff. To enable tractable analysis and intervention of the latent dynamics of LRMs, we leverage Koopman operator theory to obtain a linearized representation of their hidden state dynamics. This enables us to introduce Dynamic Spectral Dispersion (DSD), a new metric to quantify the heterogeneity of the model's latent dynamics, serving as a direct indicator of policy exploration. Building upon these foundations, we propose Reasoning with Latent eXploration (ReLaX), a framework that explicitly incorporates latent dynamics to regulate exploration and exploitation during policy optimization. Comprehensive experiments across a wide range of multimodal and text-only reasoning benchmarks show that ReLaX consistently incentivizes reasoning capability and outperforms existing token-level methods. Our project is available at https://github.com/ZhangShimin1/ReLaX.

Shimin Zhang, Xianwei Chen, Yufan Shen, Ziyuan Ye, Jibin Wu• 2025

Related benchmarks

Task	Dataset	Result
Multimodal Reasoning	MMMU (val)	Accuracy57.4	168
Multimodal Reasoning	MMStar	Accuracy65.5	143
Multimodal Mathematical Reasoning	MathVista mini (test)	Overall Accuracy77.1	114
Multimodal Reasoning	DynaMath	Accuracy55.9	77
Multi-modal Reasoning	EMMA	Accuracy30.6	61
Multi-modal Reasoning	MathVision (test)	Accuracy (%)30.2	45
Multimodal Reasoning	MathVerse (testmini)	Mean@1 Accuracy55.7	17

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord