Timezone: »
Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To better understand this development, we decompose the variance of the policy gradient estimator and numerically show that learned state-action-dependent baselines do not in fact reduce variance over a state-dependent baseline in commonly tested benchmark domains. We confirm this unexpected result by reviewing the open-source code accompanying these prior papers, and show that subtle implementation decisions cause deviations from the methods presented in the papers and explain the source of the previously observed empirical gains. Furthermore, the variance decomposition highlights areas for improvement, which we demonstrate by illustrating a simple change to the typical value function parameterization that can significantly improve performance.
Author Information
George Tucker (Google Brain)
Surya Bhupatiraju (Google Brain)
Shixiang Gu (Cambridge)
Richard E Turner (University of Cambridge)
Richard Turner holds a Lectureship (equivalent to US Assistant Professor) in Computer Vision and Machine Learning in the Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, UK. He is a Fellow of Christ's College Cambridge. Previously, he held an EPSRC Postdoctoral research fellowship which he spent at both the University of Cambridge and the Laboratory for Computational Vision, NYU, USA. He has a PhD degree in Computational Neuroscience and Machine Learning from the Gatsby Computational Neuroscience Unit, UCL, UK and a M.Sci. degree in Natural Sciences (specialism Physics) from the University of Cambridge, UK. His research interests include machine learning, signal processing and developing probabilistic models of perception.
Zoubin Ghahramani (University of Cambridge & Uber)
Zoubin Ghahramani is a Professor at the University of Cambridge, and Chief Scientist at Uber. He is also Deputy Director of the Leverhulme Centre for the Future of Intelligence, was a founding Director of the Alan Turing Institute and co-founder of Geometric Intelligence (now Uber AI Labs). His research focuses on probabilistic approaches to machine learning and AI. In 2015 he was elected a Fellow of the Royal Society.
Sergey Levine (Berkeley)

Sergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a Ph.D. in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as computer vision and graphics. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more.
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: The Mirage of Action-Dependent Baselines in Reinforcement Learning »
Thu. Jul 12th 04:15 -- 07:00 PM Room Hall B #30
More from the Same Authors
-
2021 : Attacking Few-Shot Classifiers with Adversarial Support Poisoning »
Elre Oldewage · John Bronskill · Richard E Turner -
2021 : Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability »
Dibya Ghosh · Jad Rahme · Aviral Kumar · Amy Zhang · Ryan P. Adams · Sergey Levine -
2021 : Improved Estimator Selection for Off-Policy Evaluation »
George Tucker -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2021 : Multi-Task Offline Reinforcement Learning with Conservative Data Sharing »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2021 : Reinforcement Learning as One Big Sequence Modeling Problem »
Michael Janner · Qiyang Li · Sergey Levine -
2021 : ReLMM: Practical RL for Learning Mobile Manipulation Skills Using Only Onboard Sensors »
Charles Sun · Jedrzej Orbik · Coline Devin · Abhishek Gupta · Glen Berseth · Sergey Levine -
2021 : Multi-Task Offline Reinforcement Learning with Conservative Data Sharing »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Sergey Levine · Chelsea Finn -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · Jie Ren · Joost van Amersfoort · Kehang Han · E. Kelly Buchanan · Kevin Murphy · Mark Collier · Mike Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2023 : Beyond Intuition, a Framework for Applying GPs to Real-World Data »
Kenza Tazi · Jihao Andreas Lin · ST John · Hong Ge · Richard E Turner · Ross Viljoen · Alex Gardner -
2023 : Guided Evolution with Binary Predictors for ML Program Search »
John Co-Reyes · Yingjie Miao · George Tucker · Aleksandra Faust · Esteban Real -
2023 : Modeling Accurate Long Rollouts with Temporal Neural PDE Solvers »
Phillip Lippe · Bastiaan Veeling · Paris Perdikaris · Richard E Turner · Johannes Brandstetter -
2023 Poster: Neural Diffusion Processes »
Vincent Dutordoir · Alan Saul · Zoubin Ghahramani · Fergus Simpson -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 : Q/A Sergey Levine »
Sergey Levine -
2022 : Invited Speaker: Sergey Levine »
Sergey Levine -
2022 Poster: Offline Meta-Reinforcement Learning with Online Self-Supervision »
Vitchyr Pong · Ashvin Nair · Laura Smith · Catherine Huang · Sergey Levine -
2022 Poster: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Poster: Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization »
Brandon Trabucco · Xinyang Geng · Aviral Kumar · Sergey Levine -
2022 Poster: How to Leverage Unlabeled Data in Offline Reinforcement Learning »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Chelsea Finn · Sergey Levine -
2022 Spotlight: How to Leverage Unlabeled Data in Offline Reinforcement Learning »
Tianhe (Kevin) Yu · Aviral Kumar · Yevgen Chebotar · Karol Hausman · Chelsea Finn · Sergey Levine -
2022 Spotlight: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Spotlight: Offline Meta-Reinforcement Learning with Online Self-Supervision »
Vitchyr Pong · Ashvin Nair · Laura Smith · Catherine Huang · Sergey Levine -
2022 Spotlight: Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization »
Brandon Trabucco · Xinyang Geng · Aviral Kumar · Sergey Levine -
2022 Poster: Planning with Diffusion for Flexible Behavior Synthesis »
Michael Janner · Yilun Du · Josh Tenenbaum · Sergey Levine -
2022 Oral: Planning with Diffusion for Flexible Behavior Synthesis »
Michael Janner · Yilun Du · Josh Tenenbaum · Sergey Levine -
2022 Poster: Offline RL Policies Should Be Trained to be Adaptive »
Dibya Ghosh · Anurag Ajay · Pulkit Agrawal · Sergey Levine -
2022 Oral: Offline RL Policies Should Be Trained to be Adaptive »
Dibya Ghosh · Anurag Ajay · Pulkit Agrawal · Sergey Levine -
2021 : Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Aaron Courville · Tengyu Ma · George Tucker · Sergey Levine -
2021 Poster: Simple and Effective VAE Training with Calibrated Decoders »
Oleh Rybkin · Kostas Daniilidis · Sergey Levine -
2021 Poster: WILDS: A Benchmark of in-the-Wild Distribution Shifts »
Pang Wei Koh · Shiori Sagawa · Henrik Marklund · Sang Michael Xie · Marvin Zhang · Akshay Balsubramani · Weihua Hu · Michihiro Yasunaga · Richard Lanas Phillips · Irena Gao · Tony Lee · Etienne David · Ian Stavness · Wei Guo · Berton Earnshaw · Imran Haque · Sara Beery · Jure Leskovec · Anshul Kundaje · Emma Pierson · Sergey Levine · Chelsea Finn · Percy Liang -
2021 Oral: WILDS: A Benchmark of in-the-Wild Distribution Shifts »
Pang Wei Koh · Shiori Sagawa · Henrik Marklund · Sang Michael Xie · Marvin Zhang · Akshay Balsubramani · Weihua Hu · Michihiro Yasunaga · Richard Lanas Phillips · Irena Gao · Tony Lee · Etienne David · Ian Stavness · Wei Guo · Berton Earnshaw · Imran Haque · Sara Beery · Jure Leskovec · Anshul Kundaje · Emma Pierson · Sergey Levine · Chelsea Finn · Percy Liang -
2021 Spotlight: Simple and Effective VAE Training with Calibrated Decoders »
Oleh Rybkin · Kostas Daniilidis · Sergey Levine -
2021 Poster: Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment »
Michael Chang · Sid Kaushik · Sergey Levine · Thomas Griffiths -
2021 Poster: Conservative Objective Models for Effective Offline Model-Based Optimization »
Brandon Trabucco · Aviral Kumar · Xinyang Geng · Sergey Levine -
2021 Spotlight: Conservative Objective Models for Effective Offline Model-Based Optimization »
Brandon Trabucco · Aviral Kumar · Xinyang Geng · Sergey Levine -
2021 Oral: Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment »
Michael Chang · Sid Kaushik · Sergey Levine · Thomas Griffiths -
2021 Poster: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Poster: MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning »
Kevin Li · Abhishek Gupta · Ashwin D Reddy · Vitchyr Pong · Aurick Zhou · Justin Yu · Sergey Levine -
2021 Poster: PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning »
Angelos Filos · Clare Lyle · Yarin Gal · Sergey Levine · Natasha Jaques · Gregory Farquhar -
2021 Spotlight: MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning »
Kevin Li · Abhishek Gupta · Ashwin D Reddy · Vitchyr Pong · Aurick Zhou · Justin Yu · Sergey Levine -
2021 Spotlight: Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning »
Hiroki Furuta · Tatsuya Matsushima · Tadashi Kozuno · Yutaka Matsuo · Sergey Levine · Ofir Nachum · Shixiang Gu -
2021 Oral: PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning »
Angelos Filos · Clare Lyle · Yarin Gal · Sergey Levine · Natasha Jaques · Gregory Farquhar -
2021 Poster: Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation »
Aurick Zhou · Sergey Levine -
2021 Poster: Model-Based Reinforcement Learning via Latent-Space Collocation »
Oleh Rybkin · Chuning Zhu · Anusha Nagabandi · Kostas Daniilidis · Igor Mordatch · Sergey Levine -
2021 Spotlight: Model-Based Reinforcement Learning via Latent-Space Collocation »
Oleh Rybkin · Chuning Zhu · Anusha Nagabandi · Kostas Daniilidis · Igor Mordatch · Sergey Levine -
2021 Spotlight: Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation »
Aurick Zhou · Sergey Levine -
2020 : Invited Talk 9: Prof. Sergey Levine from UC Berkeley »
Sergey Levine -
2020 Poster: Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions »
Michael Chang · Sid Kaushik · S. Matthew Weinberg · Thomas Griffiths · Sergey Levine -
2020 Poster: Learning Human Objectives by Evaluating Hypothetical Behavior »
Siddharth Reddy · Anca Dragan · Sergey Levine · Shane Legg · Jan Leike -
2020 Poster: Skew-Fit: State-Covering Self-Supervised Reinforcement Learning »
Vitchyr Pong · Murtaza Dalal · Steven Lin · Ashvin Nair · Shikhar Bahl · Sergey Levine -
2020 Poster: Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits »
Robert Peharz · Steven Lang · Antonio Vergari · Karl Stelzner · Alejandro Molina · Martin Trapp · Guy Van den Broeck · Kristian Kersting · Zoubin Ghahramani -
2020 Poster: Scalable Exact Inference in Multi-Output Gaussian Processes »
Wessel Bruinsma · Eric Perim Martins · William Tebbutt · Scott Hosking · Arno Solin · Richard E Turner -
2020 Poster: Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? »
Angelos Filos · Panagiotis Tigas · Rowan McAllister · Nicholas Rhinehart · Sergey Levine · Yarin Gal -
2020 Poster: Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings »
Jesse Zhang · Brian Cheung · Chelsea Finn · Sergey Levine · Dinesh Jayaraman -
2020 Poster: TaskNorm: Rethinking Batch Normalization for Meta-Learning »
John Bronskill · Jonathan Gordon · James Requeima · Sebastian Nowozin · Richard E Turner -
2019 : Sergey Levine: "Imitation, Prediction, and Model-Based Reinforcement Learning for Autonomous Driving" »
Sergey Levine -
2019 : Sergey Levine: Unsupervised Reinforcement Learning and Meta-Learning »
Sergey Levine -
2019 Workshop: Exploration in Reinforcement Learning Workshop »
Benjamin Eysenbach · Benjamin Eysenbach · Surya Bhupatiraju · Shixiang Gu · Harrison Edwards · Martha White · Pierre-Yves Oudeyer · Kenneth Stanley · Emma Brunskill -
2019 Workshop: ICML Workshop on Imitation, Intent, and Interaction (I3) »
Nicholas Rhinehart · Sergey Levine · Chelsea Finn · He He · Ilya Kostrikov · Justin Fu · Siddharth Reddy -
2019 : Sergei Levine: Distribution Matching and Mutual Information in Reinforcement Learning »
Sergey Levine -
2019 Workshop: Generative Modeling and Model-Based Reasoning for Robotics and AI »
Aravind Rajeswaran · Emanuel Todorov · Igor Mordatch · William Agnew · Amy Zhang · Joelle Pineau · Michael Chang · Dumitru Erhan · Sergey Levine · Kimberly Stachenfeld · Marvin Zhang -
2019 Poster: Guided evolutionary strategies: augmenting random search with surrogate gradients »
Niru Maheswaranathan · Luke Metz · George Tucker · Dami Choi · Jascha Sohl-Dickstein -
2019 Poster: On Variational Bounds of Mutual Information »
Ben Poole · Sherjil Ozair · Aäron van den Oord · Alexander Alemi · George Tucker -
2019 Oral: Guided evolutionary strategies: augmenting random search with surrogate gradients »
Niru Maheswaranathan · Luke Metz · George Tucker · Dami Choi · Jascha Sohl-Dickstein -
2019 Oral: On Variational Bounds of Mutual Information »
Ben Poole · Sherjil Ozair · Aäron van den Oord · Alexander Alemi · George Tucker -
2019 Poster: Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables »
Kate Rakelly · Aurick Zhou · Chelsea Finn · Sergey Levine · Deirdre Quillen -
2019 Poster: SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning »
Marvin Zhang · Sharad Vikram · Laura Smith · Pieter Abbeel · Matthew Johnson · Sergey Levine -
2019 Oral: Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables »
Kate Rakelly · Aurick Zhou · Chelsea Finn · Sergey Levine · Deirdre Quillen -
2019 Oral: SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning »
Marvin Zhang · Sharad Vikram · Laura Smith · Pieter Abbeel · Matthew Johnson · Sergey Levine -
2019 Poster: Learning a Prior over Intent via Meta-Inverse Reinforcement Learning »
Kelvin Xu · Ellis Ratner · Anca Dragan · Sergey Levine · Chelsea Finn -
2019 Poster: EMI: Exploration with Mutual Information »
Hyoungseok Kim · Jaekyeom Kim · Yeonwoo Jeong · Sergey Levine · Hyun Oh Song -
2019 Poster: Online Meta-Learning »
Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine -
2019 Poster: Diagnosing Bottlenecks in Deep Q-learning Algorithms »
Justin Fu · Aviral Kumar · Matthew Soh · Sergey Levine -
2019 Oral: Learning a Prior over Intent via Meta-Inverse Reinforcement Learning »
Kelvin Xu · Ellis Ratner · Anca Dragan · Sergey Levine · Chelsea Finn -
2019 Oral: EMI: Exploration with Mutual Information »
Hyoungseok Kim · Jaekyeom Kim · Yeonwoo Jeong · Sergey Levine · Hyun Oh Song -
2019 Oral: Diagnosing Bottlenecks in Deep Q-learning Algorithms »
Justin Fu · Aviral Kumar · Matthew Soh · Sergey Levine -
2019 Oral: Online Meta-Learning »
Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine -
2019 Tutorial: Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning »
Chelsea Finn · Sergey Levine -
2018 Poster: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor »
Tuomas Haarnoja · Aurick Zhou · Pieter Abbeel · Sergey Levine -
2018 Poster: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Poster: Regret Minimization for Partially Observable Deep Reinforcement Learning »
Peter Jin · EECS Kurt Keutzer · Sergey Levine -
2018 Poster: Variational Bayesian dropout: pitfalls and fixes »
Jiri Hron · Alexander Matthews · Zoubin Ghahramani -
2018 Oral: Regret Minimization for Partially Observable Deep Reinforcement Learning »
Peter Jin · EECS Kurt Keutzer · Sergey Levine -
2018 Oral: Variational Bayesian dropout: pitfalls and fixes »
Jiri Hron · Alexander Matthews · Zoubin Ghahramani -
2018 Oral: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor »
Tuomas Haarnoja · Aurick Zhou · Pieter Abbeel · Sergey Levine -
2018 Oral: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Poster: Structured Evolution with Compact Architectures for Scalable Policy Optimization »
Krzysztof Choromanski · Mark Rowland · Vikas Sindhwani · Richard E Turner · Adrian Weller -
2018 Poster: Discovering Interpretable Representations for Both Deep Generative and Discriminative Models »
Tameem Adel · Zoubin Ghahramani · Adrian Weller -
2018 Poster: Latent Space Policies for Hierarchical Reinforcement Learning »
Tuomas Haarnoja · Kristian Hartikainen · Pieter Abbeel · Sergey Levine -
2018 Poster: Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings »
John Co-Reyes · Yu Xuan Liu · Abhishek Gupta · Benjamin Eysenbach · Pieter Abbeel · Sergey Levine -
2018 Poster: Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control »
Aravind Srinivas · Allan Jabri · Pieter Abbeel · Sergey Levine · Chelsea Finn -
2018 Oral: Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control »
Aravind Srinivas · Allan Jabri · Pieter Abbeel · Sergey Levine · Chelsea Finn -
2018 Oral: Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings »
John Co-Reyes · Yu Xuan Liu · Abhishek Gupta · Benjamin Eysenbach · Pieter Abbeel · Sergey Levine -
2018 Oral: Discovering Interpretable Representations for Both Deep Generative and Discriminative Models »
Tameem Adel · Zoubin Ghahramani · Adrian Weller -
2018 Oral: Latent Space Policies for Hierarchical Reinforcement Learning »
Tuomas Haarnoja · Kristian Hartikainen · Pieter Abbeel · Sergey Levine -
2018 Oral: Structured Evolution with Compact Architectures for Scalable Policy Optimization »
Krzysztof Choromanski · Mark Rowland · Vikas Sindhwani · Richard E Turner · Adrian Weller -
2017 : Lifelong Learning - Panel Discussion »
Sergey Levine · Joelle Pineau · Balaraman Ravindran · Andrei A Rusu -
2017 : Sergey Levine: Self-supervision as a path to lifelong learning »
Sergey Levine -
2017 Poster: Magnetic Hamiltonian Monte Carlo »
Nilesh Tripuraneni · Mark Rowland · Zoubin Ghahramani · Richard E Turner -
2017 Poster: Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning »
Yevgen Chebotar · Karol Hausman · Marvin Zhang · Gaurav Sukhatme · Stefan Schaal · Sergey Levine -
2017 Talk: Magnetic Hamiltonian Monte Carlo »
Nilesh Tripuraneni · Mark Rowland · Zoubin Ghahramani · Richard E Turner -
2017 Talk: Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning »
Yevgen Chebotar · Karol Hausman · Marvin Zhang · Gaurav Sukhatme · Stefan Schaal · Sergey Levine -
2017 Poster: Lost Relatives of the Gumbel Trick »
Matej Balog · Nilesh Tripuraneni · Zoubin Ghahramani · Adrian Weller -
2017 Poster: Modular Multitask Reinforcement Learning with Policy Sketches »
Jacob Andreas · Dan Klein · Sergey Levine -
2017 Poster: Bayesian inference on random simple graphs with power law degree distributions »
Juho Lee · Creighton Heaukulani · Zoubin Ghahramani · Lancelot F. James · Seungjin Choi -
2017 Poster: Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control »
Natasha Jaques · Shixiang Gu · Dzmitry Bahdanau · Jose Miguel Hernandez-Lobato · Richard E Turner · Douglas Eck -
2017 Talk: Lost Relatives of the Gumbel Trick »
Matej Balog · Nilesh Tripuraneni · Zoubin Ghahramani · Adrian Weller -
2017 Talk: Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control »
Natasha Jaques · Shixiang Gu · Dzmitry Bahdanau · Jose Miguel Hernandez-Lobato · Richard E Turner · Douglas Eck -
2017 Talk: Bayesian inference on random simple graphs with power law degree distributions »
Juho Lee · Creighton Heaukulani · Zoubin Ghahramani · Lancelot F. James · Seungjin Choi -
2017 Poster: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks »
Chelsea Finn · Pieter Abbeel · Sergey Levine -
2017 Poster: Automatic Discovery of the Statistical Types of Variables in a Dataset »
Isabel Valera · Zoubin Ghahramani -
2017 Poster: A Birth-Death Process for Feature Allocation »
Konstantina Palla · David Knowles · Zoubin Ghahramani -
2017 Poster: Deep Bayesian Active Learning with Image Data »
Yarin Gal · Riashat Islam · Zoubin Ghahramani -
2017 Poster: Reinforcement Learning with Deep Energy-Based Policies »
Tuomas Haarnoja · Haoran Tang · Pieter Abbeel · Sergey Levine -
2017 Talk: Modular Multitask Reinforcement Learning with Policy Sketches »
Jacob Andreas · Dan Klein · Sergey Levine -
2017 Talk: A Birth-Death Process for Feature Allocation »
Konstantina Palla · David Knowles · Zoubin Ghahramani -
2017 Talk: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks »
Chelsea Finn · Pieter Abbeel · Sergey Levine -
2017 Talk: Deep Bayesian Active Learning with Image Data »
Yarin Gal · Riashat Islam · Zoubin Ghahramani -
2017 Talk: Reinforcement Learning with Deep Energy-Based Policies »
Tuomas Haarnoja · Haoran Tang · Pieter Abbeel · Sergey Levine -
2017 Talk: Automatic Discovery of the Statistical Types of Variables in a Dataset »
Isabel Valera · Zoubin Ghahramani -
2017 Tutorial: Deep Reinforcement Learning, Decision Making, and Control »
Sergey Levine · Chelsea Finn