Covariance Volume Maximization for Embodied Latent Exploration in Deep Reinforcement Learning
Abstract
Efficient exploration remains a key challenge in deep reinforcement learning, especially for embodied agents operating in realistic environments with high-dimensional observations and complex dynamics. Recent latent exploration methods define bonuses in a learned latent space, but often struggle in these settings where (i) representations can be noisy or policy-dependent, and (ii) common strategies such as randomized latent objectives or fixed directional spanning are brittle and fail to improve global coverage. We propose Covariance Volume Maximization (CVM), a coverage-driven latent exploration framework with two key components. First, we learn a behavioral state encoder using a policy-mixture objective to reduce representation drift under rapidly changing exploration policies, yielding stable and behaviorally meaningful latent displacements. Second, CVM rewards each transition by its exact increase in the log-determinant of the covariance of recent latent displacements, explicitly expanding the explored region and prioritizing under-covered directions. This objective coincides with the classical D-optimal design criterion, providing an information-efficiency justification. Extensive experiments on embodied navigation and manipulation tasks demonstrate that CVM substantially improves exploration efficiency and robustness, and scales effectively to different environments.