Poster

Variational Empowerment as Representation Learning for Goal-Conditioned Reinforcement Learning

Jongwook Choi ⋅ Archit Sharma ⋅ Honglak Lee ⋅ Sergey Levine ⋅ Shixiang Gu

Keywords: Neuroscience and Cognitive Science Neuroscience Reinforcement Learning and Planning Algorithms -> Representation Learning; Algorithms -> Sparse Coding and Dimensionality Expansion; Applications Matrix and Ten

2021 Poster

Paper PDF [ Slides] [ Paper ] [ Visit Poster at Spot D2 in Virtual World ]

Abstract

Learning to reach goal states and learning diverse skills through mutual information maximization have been proposed as principled frameworks for unsupervised reinforcement learning, allowing agents to acquire broadly applicable multi-task policies with minimal reward engineering. In this paper, we discuss how these two approaches — goal-conditioned RL (GCRL) and MI-based RL — can be generalized into a single family of methods, interpreting mutual information maximization and variational empowerment as representation learning methods that acquire function-ally aware state representations for goal reaching.Starting from a simple observation that the standard GCRL is encapsulated by the optimization objective of variational empowerment, we can derive novel variants of GCRL and variational empowerment under a single, unified optimization objective, such as adaptive-variance GCRL and linear-mapping GCRL, and study the characteristics of representation learning each variant provides. Furthermore, through the lens of GCRL, we show that adapting powerful techniques fromGCRL such as goal relabeling into the variationalMI context as well as proper regularization on the variational posterior provides substantial gains in algorithm performance, and propose a novel evaluation metric named latent goal reaching (LGR)as an objective measure for evaluating empowerment algorithms akin to goal-based RL. Through principled mathematical derivations and careful experimental validations, our work lays a novel foundation from which representation learning can be evaluated and analyzed in goal-based RL

Video

Chat is not available.