Timezone: »

 
Spotlight
Action-Sufficient State Representation Learning for Control with Structural Constraints
Biwei Huang · Chaochao Lu · Liu Leqi · Jose Miguel Hernandez-Lobato · Clark Glymour · Bernhard Schölkopf · Kun Zhang

Wed Jul 20 11:15 AM -- 11:20 AM (PDT) @ None

Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks. In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed Action-Sufficient state Representations (ASRs). We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs based on structural constraints and the goal of maximizing cumulative reward in policy learning. We then develop a structured sequential Variational Auto-Encoder to estimate the environment model and extract ASRs. Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning. Moreover, the estimated environment model and ASRs allow learning behaviors from imagined outcomes in the compact latent space to improve sample efficiency.

Author Information

Biwei Huang (Carnegie Mellon University)
Chaochao Lu (University of Cambridge)
Liu Leqi (Carnegie Mellon University)
Jose Miguel Hernandez-Lobato (University of Cambridge)
Clark Glymour (Carnegie Mellon University)
Bernhard Schölkopf (MPI for Intelligent Systems Tübingen, Germany)

Bernhard Scholkopf received degrees in mathematics (London) and physics (Tubingen), and a doctorate in computer science from the Technical University Berlin. He has researched at AT&T Bell Labs, at GMD FIRST, Berlin, at the Australian National University, Canberra, and at Microsoft Research Cambridge (UK). In 2001, he was appointed scientific member of the Max Planck Society and director at the MPI for Biological Cybernetics; in 2010 he founded the Max Planck Institute for Intelligent Systems. For further information, see www.kyb.tuebingen.mpg.de/~bs.

Kun Zhang (Carnegie Mellon University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors