Learning Task-Sufficient World Models by Synergizing Agentic Exploration and Structured Modeling
Abstract
Learning and planning in imagination using world models provides an effective paradigm for training agents for decision-making. However, existing approaches often rely on high-dimensional latent spaces or generic visual embeddings that retain many factors irrelevant to control, limiting efficiency and generalization across tasks. To this end, we study how agents can learn world models with representations that are task-specific, minimal, and sufficient for decision making. We achieve this via a closed-loop synergy between the agent and the world model, in which structured world-model learning distills task-sufficient representations from informative interaction data. On the agent side, agents perform active probing of the environment to collect informative trajectories that expose task-relevant latent factors, guided by an adaptive curriculum. On the world-model side, we learn structured representations over observations to distill compact, task-sufficient latent states from the collected interaction data. This synergy enables the recovery of task-sufficient latent representations that capture all control-relevant factors empirically. Leveraging these representations, the resulting policies achieve improved sample efficiency and systematic generalization, including generalization across skills, object–skill compositions, and previously unseen tasks on standard continuous control and robotic manipulation benchmarks.