Timezone: »
The recently proposed distributional approach to reinforcement learning (DiRL) is centered on learning the distribution of the reward-to-go, often referred to as the value distribution. In this work, we show that the distributional Bellman equation, which drives DiRL methods, is equivalent to a generative adversarial network (GAN) model. In this formulation, DiRL can be seen as learning a deep generative model of the value distribution, driven by the discrepancy between the distribution of the current value, and the distribution of the sum of current reward and next value. We use this insight to propose a GAN-based approach to DiRL, which leverages the strengths of GANs in learning distributions of high dimensional data. In particular, we show that our GAN approach can be used for DiRL with multivariate rewards, an important setting which cannot be tackled with prior methods. The multivariate setting also allows us to unify learning the distribution of values and state transitions, and we exploit this idea to devise a novel exploration method that is driven by the discrepancy in estimating both values and states.
Author Information
dror freirich (Technion)
Tzahi Shimkin (Technion Israeli Institute of Technology)
Ron Meir (Technion Israeli Institute of Technology)
Aviv Tamar (Technion)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN »
Wed. Jun 12th 10:05 -- 10:10 PM Room Hall B
More from the Same Authors
-
2023 Poster: Learning Control by Iterative Inversion »
Gal Leibovich · Guy Jacob · Or Avner · Gal Novik · Aviv Tamar -
2023 Poster: ContraBAR: Contrastive Bayes-Adaptive Deep RL »
Era Choshen · Aviv Tamar -
2023 Poster: TGRL: An Algorithm for Teacher Guided Reinforcement Learning »
Idan Shenfeld · Zhang-Wei Hong · Aviv Tamar · Pulkit Agrawal -
2022 Poster: Unsupervised Image Representation Learning with Deep Latent Particles »
Tal Daniel · Aviv Tamar -
2022 Spotlight: Unsupervised Image Representation Learning with Deep Latent Particles »
Tal Daniel · Aviv Tamar -
2021 Poster: Ensemble Bootstrapping for Q-Learning »
Oren Peer · Chen Tessler · Nadav Merlis · Ron Meir -
2021 Spotlight: Ensemble Bootstrapping for Q-Learning »
Oren Peer · Chen Tessler · Nadav Merlis · Ron Meir -
2020 Poster: Option Discovery in the Absence of Rewards with Manifold Analysis »
Amitay Bar · Ronen Talmon · Ron Meir -
2020 Poster: Hallucinative Topological Memory for Zero-Shot Visual Planning »
Kara Liu · Thanard Kurutach · Christine Tung · Pieter Abbeel · Aviv Tamar -
2020 Poster: Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning »
Tom Jurgenson · Or Avner · Edward Groshev · Aviv Tamar -
2020 Poster: Discount Factor as a Regularizer in Reinforcement Learning »
Ron Amit · Ron Meir · Kamil Ciosek -
2019 Poster: A Deep Reinforcement Learning Perspective on Internet Congestion Control »
Nathan Jay · Noga H. Rotman · Brighten Godfrey · Michael Schapira · Aviv Tamar -
2019 Oral: A Deep Reinforcement Learning Perspective on Internet Congestion Control »
Nathan Jay · Noga H. Rotman · Brighten Godfrey · Michael Schapira · Aviv Tamar -
2018 Poster: Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory »
Ron Amit · Ron Meir -
2018 Oral: Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory »
Ron Amit · Ron Meir -
2017 Poster: Constrained Policy Optimization »
Joshua Achiam · David Held · Aviv Tamar · Pieter Abbeel -
2017 Talk: Constrained Policy Optimization »
Joshua Achiam · David Held · Aviv Tamar · Pieter Abbeel