Text Generation as Continuous Latent Dynamics via Reinforcement Learning
Abstract
We propose to model text generation as a continuous-time latent dynamical process, where token generation is formulated as a Markov Decision Process whose internal state evolves via a neural ODE. This formulation bridges discrete token sequences and continuous semantic evolution, providing a theoretically grounded approach for coherent long-range generation. The framework is optimized via reinforcement learning, maximizing a composite objective that integrates task-specific rewards with knowledge distillation from a powerful pre-trained language model. Experiments demonstrate that our method, Continuous-Time Latent Language Model (CT-LLM), outperforms discrete baselines in generation coherence and long-context performance, offering a new paradigm for fluid and controllable language generation.