Timezone: »
We study the sparse entropy-regularized reinforcement learning (ERL) problem in which the entropy term is a special form of the Tsallis entropy. The optimal policy of this formulation is sparse, i.e., at each state, it has non-zero probability for only a small number of actions. This addresses the main drawback of the standard Shannon entropy-regularized RL (soft ERL) formulation, in which the optimal policy is softmax, and thus, may assign a non-negligible probability mass to non-optimal actions. This problem is aggravated as the number of actions is increased. In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called sparse PCL, for the sparse ERL problem that can work with both on-policy and off-policy data. We first derive a sparse consistency equation that specifies a relationship between the optimal value function and policy of the sparse ERL along any system trajectory. Crucially, a weak form of the converse is also true, and we quantify the sub-optimality of a policy which satisfies sparse consistency, and show that as we increase the number of actions, this sub-optimality is better than that of the soft ERL optimal policy. We then use this result to derive the sparse PCL algorithms. We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions.
Author Information
Yinlam Chow (DeepMind)
Ofir Nachum (Google Brain)
Mohammad Ghavamzadeh (Facebook AI Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Path Consistency Learning in Tsallis Entropy Regularized MDPs »
Wed. Jul 11th 04:15 -- 07:00 PM Room Hall B #172
More from the Same Authors
-
2021 : SparseDice: Imitation Learning for Temporally Sparse Data via Regularization »
Alberto Camacho · Izzeddin Gur · Marcin Moczulski · Ofir Nachum · Aleksandra Faust -
2021 : Understanding the Generalization Gap in Visual Reinforcement Learning »
Anurag Ajay · Ge Yang · Ofir Nachum · Pulkit Agrawal -
2022 : SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition »
Dylan Slack · Yinlam Chow · Bo Dai · Nevan Wichers -
2022 Poster: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Poster: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Spotlight: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Spotlight: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2021 Poster: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Poster: Offline Reinforcement Learning with Fisher Divergence Critic Regularization »
Ilya Kostrikov · Rob Fergus · Jonathan Tompson · Ofir Nachum -
2021 Poster: Representation Matters: Offline Pretraining for Sequential Decision Making »
Mengjiao Yang · Ofir Nachum -
2021 Spotlight: Representation Matters: Offline Pretraining for Sequential Decision Making »
Mengjiao Yang · Ofir Nachum -
2021 Spotlight: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Spotlight: Offline Reinforcement Learning with Fisher Divergence Critic Regularization »
Ilya Kostrikov · Rob Fergus · Jonathan Tompson · Ofir Nachum -
2020 : Panel discussion »
Neil Lawrence · Mohammad Ghavamzadeh · Leilani Gilpin · Huyen Nguyen · Ernest Mwebaze · Nevena Lalic -
2020 : Conservative Exploration in Bandits and Reinforcement Learning »
Mohammad Ghavamzadeh -
2020 Poster: Predictive Coding for Locally-Linear Control »
Rui Shu · Tung Nguyen · Yinlam Chow · Tuan Pham · Khoat Than · Mohammad Ghavamzadeh · Stefano Ermon · Hung Bui -
2020 Poster: Adaptive Sampling for Estimating Probability Distributions »
Shubhanshu Shekhar · Tara Javidi · Mohammad Ghavamzadeh -
2020 Poster: Multi-step Greedy Reinforcement Learning Algorithms »
Manan Tomar · Yonathan Efroni · Mohammad Ghavamzadeh -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 : posters »
Zhengxing Chen · Juan Jose Garau Luis · Ignacio Albert Smet · Aditya Modi · Sabina Tomkins · Riley Simmons-Edler · Hongzi Mao · Alexander Irpan · Hao Lu · Rose Wang · Subhojyoti Mukherjee · Aniruddh Raghu · Syed Arbab Mohd Shihab · Byung Hoon Ahn · Rasool Fakoor · Pratik Chaudhari · Elena Smirnova · Min-hwan Oh · Xiaocheng Tang · Tony Qin · Qingyang Li · Marc Brittain · Ian Fox · Supratik Paul · Xiaofeng Gao · Yinlam Chow · Gabriel Dulac-Arnold · Ofir Nachum · Nikos Karampatziakis · Bharathan Balaji · Supratik Paul · Ali Davody · Djallel Bouneffouf · Himanshu Sahni · Soo Kim · Andrey Kolobov · Alexander Amini · Yao Liu · Xinshi Chen · · Craig Boutilier -
2019 Poster: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Oral: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Poster: DeepMDP: Learning Continuous Latent Space Models for Representation Learning »
Carles Gelada · Saurabh Kumar · Jacob Buckman · Ofir Nachum · Marc Bellemare -
2019 Oral: DeepMDP: Learning Continuous Latent Space Models for Representation Learning »
Carles Gelada · Saurabh Kumar · Jacob Buckman · Ofir Nachum · Marc Bellemare -
2018 Poster: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Oral: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Poster: More Robust Doubly Robust Off-policy Evaluation »
Mehrdad Farajtabar · Yinlam Chow · Mohammad Ghavamzadeh -
2018 Oral: More Robust Doubly Robust Off-policy Evaluation »
Mehrdad Farajtabar · Yinlam Chow · Mohammad Ghavamzadeh -
2017 Poster: Active Learning for Accurate Estimation of Linear Models »
Carlos Riquelme Ruiz · Mohammad Ghavamzadeh · Alessandro Lazaric -
2017 Poster: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt -
2017 Poster: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Poster: Bottleneck Conditional Density Estimation »
Rui Shu · Hung Bui · Mohammad Ghavamzadeh -
2017 Talk: Active Learning for Accurate Estimation of Linear Models »
Carlos Riquelme Ruiz · Mohammad Ghavamzadeh · Alessandro Lazaric -
2017 Talk: Bottleneck Conditional Density Estimation »
Rui Shu · Hung Bui · Mohammad Ghavamzadeh -
2017 Talk: Online Learning to Rank in Stochastic Click Models »
Masrour Zoghi · Tomas Tunys · Mohammad Ghavamzadeh · Branislav Kveton · Csaba Szepesvari · Zheng Wen -
2017 Talk: Model-Independent Online Learning for Influence Maximization »
Sharan Vaswani · Branislav Kveton · Zheng Wen · Mohammad Ghavamzadeh · Laks V.S Lakshmanan · Mark Schmidt