Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

About

Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail to capture the global magnitude of the target value, limiting their precision and generalization. In this paper, we propose to unlock the potential of decoding-based regression via Reinforcement Learning (RL). We formulate the generation process as a Markov Decision Process, utilizing sequence-level rewards to enforce global numerical coherence. Extensive experiments on tabular regression and code metric regression demonstrate that our method (specifically with ReMax and GRPO) consistently outperforms both state-of-the-art token-level baselines and traditional regression heads, showing the superiority of introducing sequence-level signals. Our analysis further reveals that RL significantly enhances sampling efficiency and predictive precision, establishing decoding-based regression as a robust and accurate paradigm for general-purpose numerical prediction.

Ming Chen, Sheng Tang, Rong-Xi Tan, Ziniu Li, Jiacheng Chen, Ke Xue, Chao Qian• 2025

Related benchmarks

Task	Dataset	Result
Regression	MoleculeNet Lipophilicity (test)	RMSE0.822	37
Molecular property prediction	FreeSolv MoleculeNet	RMSE2.561	17
Molecular property prediction	MoleculeNet ESOL (test)	RMSE0.878	12
Tabular Regression	TALENT 100 regression tasks	RMSE (Mean)0.5151	8
Code Latency Prediction	KBSS	Spearman's Rho0.6	7
Code Peak-Memory Prediction	APPS	Correlation (rho)0.92	7
Code metric regression	APPS Leetcode (test)	RMSE0.474	6
Code metric regression	Triton Kernel Latency (test)	RMSE1.094	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord