Poster Wed, Jul 8, 2026 • 2:30 PM – 4:15 PM KST Coex: HALL A

Covariance Volume Maximization for Embodied Latent Exploration in Deep Reinforcement Learning

Yiming Wang ⋅ Yiheng Zhang ⋅ Kaiyan Zhao ⋅ Xingjie Zuo ⋅ Xingyu Liu ⋅ Xuetao Li ⋅ Furui Liu ⋅ Bo An ⋅ Leong Hou U

Abstract

Efficient exploration remains a key challenge in deep reinforcement learning, especially for embodied agents operating in realistic environments with high-dimensional observations and complex dynamics. Recent latent exploration methods define bonuses in a learned latent space, but often struggle in these settings where (i) representations can be noisy or policy-dependent, and (ii) common strategies such as randomized latent objectives or fixed directional spanning are brittle and fail to improve global coverage. We propose Covariance Volume Maximization (CVM), a coverage-driven latent exploration framework with two key components. First, we learn a behavioral state encoder using a policy-mixture objective to reduce representation drift under rapidly changing exploration policies, yielding stable and behaviorally meaningful latent displacements. Second, CVM rewards each transition by its exact increase in the log-determinant of the covariance of recent latent displacements, explicitly expanding the explored region and prioritizing under-covered directions. This objective coincides with the classical D-optimal design criterion, providing an information-efficiency justification. Extensive experiments on embodied navigation and manipulation tasks demonstrate that CVM substantially improves exploration efficiency and robustness, and scales effectively to different environments.