Timezone: »
What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue that the agent should not pursue unobtainable goals. Instead, it should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We propose to optimize this objective by having the agent pursue past achieved goals in sparsely explored areas of the goal space, which focuses exploration on the frontier of the achievable goal set. We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks including maze navigation and block stacking.
Author Information
Silviu Pitis (University of Toronto)
Harris Chan (University of Toronto, Vector Institute)
Stephen Zhao (University of Toronto)
Bradly Stadie (Vector Institute)
Jimmy Ba (University of Toronto)
More from the Same Authors
-
2021 : On Low Rank Training of Deep Neural Networks »
Siddhartha Kamalakara · Acyr Locatelli · Bharat Venkitesh · Jimmy Ba · Yarin Gal · Aidan Gomez -
2022 : You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2023 : Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu -
2023 : Training on Thin Air: Improve Image Classification with Generated Data »
Yongchao Zhou · Hshmat Sahak · Jimmy Ba -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : Calibrating Language Models via Augmented Prompt Ensembles »
Mingjian Jiang · Yangjun Ruan · Sicong Huang · Saifei Liao · Silviu Pitis · Roger Grosse · Jimmy Ba -
2023 : Using Synthetic Data for Data Augmentation to Improve Classification Accuracy »
Yongchao Zhou · Hshmat Sahak · Jimmy Ba -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 Poster: TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation »
Zhaoyan Liu · Noël Vouitsis · Satya Krishna Gorti · Jimmy Ba · Gabriel Loaiza-Ganem -
2021 Poster: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Spotlight: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Poster: World Model as a Graph: Learning Latent Landmarks for Planning »
Lunjun Zhang · Ge Yang · Bradly Stadie -
2021 Oral: World Model as a Graph: Learning Latent Landmarks for Planning »
Lunjun Zhang · Ge Yang · Bradly Stadie -
2020 Poster: Improving Transformer Optimization Through Better Initialization »
Xiao Shi Huang · Felipe Perez · Jimmy Ba · Maksims Volkovs