Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning
Ming Chen ⋅ Sheng Tang ⋅ Rong-Xi Tan ⋅ Ziniu Li ⋅ Jiacheng Chen ⋅ Ke Xue ⋅ Chao Qian
Abstract
Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail to capture the global magnitude of the target value, limiting their precision and generalization. In this paper, we propose to unlock the potential of decoding-based regression via reinforcement learning. We formulate the generation process as a Markov decision process, utilizing sequence-level rewards to enforce global numerical coherence. Under this framework, we present GenRe$^2$, which combines policy gradient methods to preserve error magnitudes with dense expert supervision, resolving the temporal credit assignment challenge. Extensive experiments across tabular regression, code metric prediction and generative reward modeling demonstrate that GenRe$^2$ consistently outperforms traditional baselines, establishing a robust paradigm for general-purpose numerical prediction.
Successful Page Load