Timezone: »
The recent success of supervised learning methods on ever larger offline datasets has spurred interest in the reinforcement learning (RL) field to investigate whether the same paradigms can be translated to RL algorithms. This research area, known as offline RL, has largely focused on offline policy optimization, aiming to find a return-maximizing policy exclusively from offline data. In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making. We aim to answer the question, what unsupervised objectives applied to offline datasets are able to learn state representations which elevate performance on downstream tasks, whether those downstream tasks be online RL, imitation learning from expert demonstrations, or even offline policy optimization based on the same offline dataset? Through a variety of experiments utilizing standard offline RL datasets, we find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms that otherwise yield mediocre performance on their own. Extensive ablations further provide insights into what components of these unsupervised objectives – e.g., reward prediction, continuous or discrete representations, pretraining or finetuning – are most important and in which settings.
Author Information
Mengjiao Yang (Google Brain)
Ofir Nachum (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Representation Matters: Offline Pretraining for Sequential Decision Making »
Wed. Jul 21st 04:00 -- 06:00 AM Room Virtual
More from the Same Authors
-
2021 : SparseDice: Imitation Learning for Temporally Sparse Data via Regularization »
Alberto Camacho · Izzeddin Gur · Marcin Moczulski · Ofir Nachum · Aleksandra Faust -
2021 : Understanding the Generalization Gap in Visual Reinforcement Learning »
Anurag Ajay · Ge Yang · Ofir Nachum · Pulkit Agrawal -
2023 Poster: Multi-Environment Pretraining Enables Transfer to Action Limited Datasets »
David Venuto · Mengjiao Yang · Pieter Abbeel · Doina Precup · Igor Mordatch · Ofir Nachum -
2022 Poster: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Poster: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Poster: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Spotlight: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Spotlight: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Spotlight: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Poster: Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization »
Hanjun Dai · Mengjiao Yang · Yuan Xue · Dale Schuurmans · Bo Dai -
2022 Spotlight: Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization »
Hanjun Dai · Mengjiao Yang · Yuan Xue · Dale Schuurmans · Bo Dai -
2021 Poster: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Poster: Offline Reinforcement Learning with Fisher Divergence Critic Regularization »
Ilya Kostrikov · Rob Fergus · Jonathan Tompson · Ofir Nachum -
2021 Spotlight: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Spotlight: Offline Reinforcement Learning with Fisher Divergence Critic Regularization »
Ilya Kostrikov · Rob Fergus · Jonathan Tompson · Ofir Nachum -
2020 Poster: Energy-Based Processes for Exchangeable Data »
Mengjiao Yang · Bo Dai · Hanjun Dai · Dale Schuurmans -
2019 : posters »
Zhengxing Chen · Juan Jose Garau Luis · Ignacio Albert Smet · Aditya Modi · Sabina Tomkins · Riley Simmons-Edler · Hongzi Mao · Alexander Irpan · Hao Lu · Rose Wang · Subhojyoti Mukherjee · Aniruddh Raghu · Syed Arbab Mohd Shihab · Byung Hoon Ahn · Rasool Fakoor · Pratik Chaudhari · Elena Smirnova · Min-hwan Oh · Xiaocheng Tang · Tony Qin · Qingyang Li · Marc Brittain · Ian Fox · Supratik Paul · Xiaofeng Gao · Yinlam Chow · Gabriel Dulac-Arnold · Ofir Nachum · Nikos Karampatziakis · Bharathan Balaji · Supratik Paul · Ali Davody · Djallel Bouneffouf · Himanshu Sahni · Soo Kim · Andrey Kolobov · Alexander Amini · Yao Liu · Xinshi Chen · · Craig Boutilier -
2019 Poster: DeepMDP: Learning Continuous Latent Space Models for Representation Learning »
Carles Gelada · Saurabh Kumar · Jacob Buckman · Ofir Nachum · Marc Bellemare -
2019 Oral: DeepMDP: Learning Continuous Latent Space Models for Representation Learning »
Carles Gelada · Saurabh Kumar · Jacob Buckman · Ofir Nachum · Marc Bellemare -
2018 Poster: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Oral: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Poster: Path Consistency Learning in Tsallis Entropy Regularized MDPs »
Yinlam Chow · Ofir Nachum · Mohammad Ghavamzadeh -
2018 Oral: Path Consistency Learning in Tsallis Entropy Regularized MDPs »
Yinlam Chow · Ofir Nachum · Mohammad Ghavamzadeh