Mind Dreamer: Untethering Imagination via Active Counterfactual Reasoning on Latent Manifolds
Shaojun Xu ⋅ Xiaoling Zhou ⋅ Yihan Lin ⋅ Yapeng Meng ⋅ Xinglong Ji ⋅ Luping Shi ⋅ Rong Zhao
Abstract
Model-Based Reinforcement Learning (MBRL) leverages latent imagination for sample efficiency, yet remains constrained by **Historical Tethering**: imagination is typically initialized from observed states. This creates a learning asymmetry, where the world model’s manifold discovery outpaces the policy's sparse-reward optimization. We propose **Mind Dreamer (MD)**, a framework that operationalizes **Active Counterfactual Reasoning (ACR)** to transcend Markovian continuity. MD reformulates discovery as the minimization of a global Relay Manifold Expected Free Energy (R-EFE); by invoking a latent-space $do$-operator, MD utilizes an adversarial generator to synthesize non-continuous **latent jumps** to epistemic blind spots that are physically plausible yet cognitively challenging. To resolve the credit assignment paradox across these spatial ruptures, we derive the **Relay Value Function (RVF)** and **Relay Uncertainty Function (RUF)**. These potentials treat synthesized anchors as latent bridges, propagating pragmatic and epistemic value through a principled Bellman-style formulation. Notably, we prove that the Uncertainty propagation across discontinuities necessitates a quadratic discount $\gamma^2$, establishing a formal epistemic horizon. Theoretically, MD acts as an optimal importance sampler that expands the manifold's spectral gap, reducing the hitting time to critical bottleneck states. Empirically, MD achieves a **1.67$\times$ average speedup** over DreamerV3 on DeepMind Control Suite, reaching **8.8$\times$** in sparse-reward tasks.
Successful Page Load