What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
Abstract
Latent Chain-of-Thought (CoT) aims to internalize reasoning into continuous hidden states, promising to transcend the computational bottlenecks of explicit tokens. However, the precise mechanisms ensuring its validity remain opaque. To bridge this gap, we establish an Information-Theoretic Framework that dissects supervision into Trajectory Control and State Alignment. Our analysis identifies structural scaffolding as the fundamental prerequisite for valid latent dynamics, and demonstrate that Outcome Supervision falters due to optimization barriers, while Process Supervision succeeds by minimizing conditional entropy, thereby enforcing trajectory predictability. And we expose a divergence in alignment strategies: rigid Geometric Compression acts as a destructive prior that collapses the reasoning manifold, whereas Generative Reconstruction serves as a flexible semantic tether, optimizing for reconstructibility to preserve the intrinsic dimensionality of the latent space. To quantify these dynamics, we introduce the Unified Latent-MI Probe (ULP), which unveils a strict Information-Performance Binding: reasoning accuracy is deeply correlated with the mutual information retained in the latent chain.Ultimately, we advocate for a paradigm shift from geometric imitation to mutual information maximization to counter the information decay inherent in autoregressive generation.