Timezone: »
State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. Moreover, the gradients of expected reward with respect to the mean and covariance of a parameterized Gaussian policy can be recovered from the gradient and Hessian of the smoothed Q-value function. Based on these relationships we develop new algorithms for training a Gaussian policy directly from a learned smoothed Q-value approximator. The approach is additionally amenable to proximal optimization by augmenting the objective with a penalty on KL-divergence from a previous policy. We find that the ability to learn both a mean and covariance during training leads to significantly improved results on standard continuous control benchmarks.
Author Information
Ofir Nachum (Google Brain)
Mohammad Norouzi (Google Brain)
George Tucker (Google Brain)
Dale Schuurmans (University of Alberta)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Smoothed Action Value Functions for Learning Gaussian Policies »
Thu. Jul 12th 04:15 -- 07:00 PM Room Hall B #42
More from the Same Authors
-
2021 : Improved Estimator Selection for Off-Policy Evaluation »
George Tucker -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2021 : SparseDice: Imitation Learning for Temporally Sparse Data via Regularization »
Alberto Camacho · Izzeddin Gur · Marcin Moczulski · Ofir Nachum · Aleksandra Faust -
2021 : Understanding the Generalization Gap in Visual Reinforcement Learning »
Anurag Ajay · Ge Yang · Ofir Nachum · Pulkit Agrawal -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2023 Poster: Revisiting Sampling for Combinatorial Optimization »
Haoran Sun · Katayoon Goshvadi · Azade Nova · Dale Schuurmans · Hanjun Dai -
2023 Poster: Stochastic Gradient Succeeds for Bandits »
Jincheng Mei · Zixin Zhong · Bo Dai · Alekh Agarwal · Csaba Szepesvari · Dale Schuurmans -
2023 Poster: Gradient-Free Structured Pruning with Unlabeled Data »
Azade Nova · Hanjun Dai · Dale Schuurmans -
2023 Poster: Multi-Environment Pretraining Enables Transfer to Action Limited Datasets »
David Venuto · Mengjiao Yang · Pieter Abbeel · Doina Precup · Igor Mordatch · Ofir Nachum -
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel -
2022 Poster: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Poster: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Spotlight: Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error »
Scott Fujimoto · David Meger · Doina Precup · Ofir Nachum · Shixiang Gu -
2022 Spotlight: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2021 Poster: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: Characterizing the Gap Between Actor-Critic and Policy Gradient »
Junfeng Wen · Saurabh Kumar · Ramki Gummadi · Dale Schuurmans -
2021 Spotlight: Characterizing the Gap Between Actor-Critic and Policy Gradient »
Junfeng Wen · Saurabh Kumar · Ramki Gummadi · Dale Schuurmans -
2021 Poster: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Poster: Offline Reinforcement Learning with Fisher Divergence Critic Regularization »
Ilya Kostrikov · Rob Fergus · Jonathan Tompson · Ofir Nachum -
2021 Poster: Representation Matters: Offline Pretraining for Sequential Decision Making »
Mengjiao Yang · Ofir Nachum -
2021 Spotlight: Representation Matters: Offline Pretraining for Sequential Decision Making »
Mengjiao Yang · Ofir Nachum -
2021 Spotlight: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Spotlight: Offline Reinforcement Learning with Fisher Divergence Critic Regularization »
Ilya Kostrikov · Rob Fergus · Jonathan Tompson · Ofir Nachum -
2020 Poster: On the Global Convergence Rates of Softmax Policy Gradient Methods »
Jincheng Mei · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Domain Aggregation Networks for Multi-Source Domain Adaptation »
Junfeng Wen · Russell Greiner · Dale Schuurmans -
2020 Poster: Batch Stationary Distribution Estimation »
Junfeng Wen · Bo Dai · Lihong Li · Dale Schuurmans -
2020 Poster: Imputer: Sequence Modelling via Imputation and Dynamic Programming »
William Chan · Chitwan Saharia · Geoffrey Hinton · Mohammad Norouzi · Navdeep Jaitly -
2020 Poster: An Optimistic Perspective on Offline Deep Reinforcement Learning »
Rishabh Agarwal · Dale Schuurmans · Mohammad Norouzi -
2020 Poster: A Simple Framework for Contrastive Learning of Visual Representations »
Ting Chen · Simon Kornblith · Mohammad Norouzi · Geoffrey Hinton -
2019 : posters »
Zhengxing Chen · Juan Jose Garau Luis · Ignacio Albert Smet · Aditya Modi · Sabina Tomkins · Riley Simmons-Edler · Hongzi Mao · Alexander Irpan · Hao Lu · Rose Wang · Subhojyoti Mukherjee · Aniruddh Raghu · Syed Arbab Mohd Shihab · Byung Hoon Ahn · Rasool Fakoor · Pratik Chaudhari · Elena Smirnova · Min-hwan Oh · Xiaocheng Tang · Tony Qin · Qingyang Li · Marc Brittain · Ian Fox · Supratik Paul · Xiaofeng Gao · Yinlam Chow · Gabriel Dulac-Arnold · Ofir Nachum · Nikos Karampatziakis · Bharathan Balaji · Supratik Paul · Ali Davody · Djallel Bouneffouf · Himanshu Sahni · Soo Kim · Andrey Kolobov · Alexander Amini · Yao Liu · Xinshi Chen · · Craig Boutilier -
2019 Poster: Guided evolutionary strategies: augmenting random search with surrogate gradients »
Niru Maheswaranathan · Luke Metz · George Tucker · Dami Choi · Jascha Sohl-Dickstein -
2019 Poster: Similarity of Neural Network Representations Revisited »
Simon Kornblith · Mohammad Norouzi · Honglak Lee · Geoffrey Hinton -
2019 Poster: On Variational Bounds of Mutual Information »
Ben Poole · Sherjil Ozair · Aäron van den Oord · Alexander Alemi · George Tucker -
2019 Poster: Learning to Generalize from Sparse and Underspecified Rewards »
Rishabh Agarwal · Chen Liang · Dale Schuurmans · Mohammad Norouzi -
2019 Oral: Guided evolutionary strategies: augmenting random search with surrogate gradients »
Niru Maheswaranathan · Luke Metz · George Tucker · Dami Choi · Jascha Sohl-Dickstein -
2019 Oral: On Variational Bounds of Mutual Information »
Ben Poole · Sherjil Ozair · Aäron van den Oord · Alexander Alemi · George Tucker -
2019 Oral: Similarity of Neural Network Representations Revisited »
Simon Kornblith · Mohammad Norouzi · Honglak Lee · Geoffrey Hinton -
2019 Oral: Learning to Generalize from Sparse and Underspecified Rewards »
Rishabh Agarwal · Chen Liang · Dale Schuurmans · Mohammad Norouzi -
2019 Poster: Understanding the Impact of Entropy on Policy Optimization »
Zafarali Ahmed · Nicolas Le Roux · Mohammad Norouzi · Dale Schuurmans -
2019 Oral: Understanding the Impact of Entropy on Policy Optimization »
Zafarali Ahmed · Nicolas Le Roux · Mohammad Norouzi · Dale Schuurmans -
2019 Poster: DeepMDP: Learning Continuous Latent Space Models for Representation Learning »
Carles Gelada · Saurabh Kumar · Jacob Buckman · Ofir Nachum · Marc Bellemare -
2019 Oral: DeepMDP: Learning Continuous Latent Space Models for Representation Learning »
Carles Gelada · Saurabh Kumar · Jacob Buckman · Ofir Nachum · Marc Bellemare -
2018 Poster: The Mirage of Action-Dependent Baselines in Reinforcement Learning »
George Tucker · Surya Bhupatiraju · Shixiang Gu · Richard E Turner · Zoubin Ghahramani · Sergey Levine -
2018 Oral: The Mirage of Action-Dependent Baselines in Reinforcement Learning »
George Tucker · Surya Bhupatiraju · Shixiang Gu · Richard E Turner · Zoubin Ghahramani · Sergey Levine -
2018 Poster: Path Consistency Learning in Tsallis Entropy Regularized MDPs »
Yinlam Chow · Ofir Nachum · Mohammad Ghavamzadeh -
2018 Oral: Path Consistency Learning in Tsallis Entropy Regularized MDPs »
Yinlam Chow · Ofir Nachum · Mohammad Ghavamzadeh -
2017 Poster: Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs »
Michael Gygli · Mohammad Norouzi · Anelia Angelova -
2017 Poster: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Talk: Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs »
Michael Gygli · Mohammad Norouzi · Anelia Angelova -
2017 Talk: Device Placement Optimization with Reinforcement Learning »
Azalia Mirhoseini · Hieu Pham · Quoc Le · benoit steiner · Mohammad Norouzi · Rasmus Larsen · Yuefeng Zhou · Naveen Kumar · Samy Bengio · Jeff Dean -
2017 Poster: Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders »
Cinjon Resnick · Adam Roberts · Jesse Engel · Douglas Eck · Sander Dieleman · Karen Simonyan · Mohammad Norouzi -
2017 Talk: Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders »
Cinjon Resnick · Adam Roberts · Jesse Engel · Douglas Eck · Sander Dieleman · Karen Simonyan · Mohammad Norouzi