Timezone: »
Poster
Separable value functions across time-scales
Joshua Romoff · Peter Henderson · Ahmed Touati · Yann Ollivier · Joelle Pineau · Emma Brunskill
In many finite horizon episodic reinforcement learning (RL) settings, it is desirable to optimize for the undiscounted return - in settings like Atari, for instance, the goal is to collect the most points while staying alive in the long run. Yet, it may be difficult (or even intractable) mathematically to learn with this target. As such, temporal discounting is often applied to optimize over a shorter effective planning horizon. This comes at the cost of potentially biasing the optimization target away from the undiscounted goal. In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning. We present an extension of temporal difference (TD) learning, which we call TD($\Delta$), that breaks down a value function into a series of components based on the differences between value functions with smaller discount factors. The separation of a longer horizon value function into these components has useful properties in scalability and performance. We discuss these properties and show theoretic and empirical improvements over standard TD learning in certain settings.
Author Information
Joshua Romoff (McGill University)
Peter Henderson (Stanford University)
Ahmed Touati (MILA / FAIR)
Yann Ollivier (Facebook Artificial Intelligence Research)
Joelle Pineau (McGill University / Facebook)
Emma Brunskill (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Separable value functions across time-scales »
Tue. Jun 11th 11:00 -- 11:20 PM Room Room 104
More from the Same Authors
-
2021 : Model-based Offline Reinforcement Learning with Local Misspecification »
Kefan Dong · Ramtin Keramati · Emma Brunskill -
2021 : Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity »
Jonathan Lee · Weihao Kong · Aldo Pacchiano · Vidya Muthukumar · Emma Brunskill -
2021 : Avoiding Overfitting to the Importance Weights in Offline Policy Optimization »
Yao Liu · Emma Brunskill -
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Giving Feedback on Interactive Student Programs with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Giving Complex Feedback in Online Student Learning with Meta-Exploration »
Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn -
2022 : Invited Talk: Emma Brunskill »
Emma Brunskill -
2021 : Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning »
Andrea Zanette · Martin Wainwright · Emma Brunskill -
2021 Workshop: ICML 2021 Workshop on Unsupervised Reinforcement Learning »
Feryal Behbahani · Joelle Pineau · Lerrel Pinto · Roberta Raileanu · Aravind Srinivas · Denis Yarats · Amy Zhang -
2021 Poster: OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation »
Jongmin Lee · Wonseok Jeon · Byung-Jun Lee · Joelle Pineau · Kee-Eung Kim -
2021 Spotlight: OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation »
Jongmin Lee · Wonseok Jeon · Byung-Jun Lee · Joelle Pineau · Kee-Eung Kim -
2020 : Q&A: Peter Henderson »
Peter Henderson · Mayoore Jaiswal · Ryan Lowe -
2020 : Invited Talk: Peter Henderson »
Peter Henderson -
2020 Workshop: Theoretical Foundations of Reinforcement Learning »
Emma Brunskill · Thodoris Lykouris · Max Simchowitz · Wen Sun · Mengdi Wang -
2020 Workshop: MLRetrospectives: A Venue for Self-Reflection in ML Research »
Jessica Forde · Jesse Dodge · Mayoore Jaiswal · Rosanne Liu · Ryan Lowe · Rosanne Liu · Joelle Pineau · Yoshua Bengio -
2020 Poster: Online Learned Continual Compression with Adaptive Quantization Modules »
Lucas Caccia · Eugene Belilovsky · Massimo Caccia · Joelle Pineau -
2020 Poster: Constrained Markov Decision Processes via Backward Value Functions »
Harsh Satija · Philip Amortila · Joelle Pineau -
2020 Poster: Interference and Generalization in Temporal Difference Learning »
Emmanuel Bengio · Joelle Pineau · Doina Precup -
2020 Poster: Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions »
Omer Gottesman · Joseph Futoma · Yao Liu · Sonali Parbhoo · Leo Celi · Emma Brunskill · Finale Doshi-Velez -
2020 Poster: Learning Near Optimal Policies with Low Inherent Bellman Error »
Andrea Zanette · Alessandro Lazaric · Mykel Kochenderfer · Emma Brunskill -
2020 Poster: Invariant Causal Prediction for Block MDPs »
Amy Zhang · Clare Lyle · Shagun Sodhani · Angelos Filos · Marta Kwiatkowska · Joelle Pineau · Yarin Gal · Doina Precup -
2020 Poster: Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling »
Yao Liu · Pierre-Luc Bacon · Emma Brunskill -
2019 Workshop: Exploration in Reinforcement Learning Workshop »
Benjamin Eysenbach · Benjamin Eysenbach · Surya Bhupatiraju · Shixiang Gu · Harrison Edwards · Martha White · Pierre-Yves Oudeyer · Kenneth Stanley · Emma Brunskill -
2019 : Emma Brunskill (Stanford) - Minimizing & Understanding the Data Needed to Learn to Make Good Sequences of Decisions »
Emma Brunskill -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 Workshop: Generative Modeling and Model-Based Reasoning for Robotics and AI »
Aravind Rajeswaran · Emanuel Todorov · Igor Mordatch · William Agnew · Amy Zhang · Joelle Pineau · Michael Chang · Dumitru Erhan · Sergey Levine · Kimberly Stachenfeld · Marvin Zhang -
2019 Poster: TarMAC: Targeted Multi-Agent Communication »
Abhishek Das · Theophile Gervet · Joshua Romoff · Dhruv Batra · Devi Parikh · Michael Rabbat · Joelle Pineau -
2019 Oral: TarMAC: Targeted Multi-Agent Communication »
Abhishek Das · Theophile Gervet · Joshua Romoff · Dhruv Batra · Devi Parikh · Michael Rabbat · Joelle Pineau -
2019 Poster: Combining parametric and nonparametric models for off-policy evaluation »
Omer Gottesman · Yao Liu · Scott Sussex · Emma Brunskill · Finale Doshi-Velez -
2019 Oral: Combining parametric and nonparametric models for off-policy evaluation »
Omer Gottesman · Yao Liu · Scott Sussex · Emma Brunskill · Finale Doshi-Velez -
2019 Poster: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Poster: First-Order Adversarial Vulnerability of Neural Networks and Input Dimension »
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz -
2019 Poster: White-box vs Black-box: Bayes Optimal Strategies for Membership Inference »
Alexandre Sablayrolles · Douze Matthijs · Cordelia Schmid · Yann Ollivier · Herve Jegou -
2019 Poster: Making Deep Q-learning methods robust to time discretization »
Corentin Tallec · Leonard Blier · Yann Ollivier -
2019 Poster: Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds »
Andrea Zanette · Emma Brunskill -
2019 Oral: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Oral: Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds »
Andrea Zanette · Emma Brunskill -
2019 Oral: White-box vs Black-box: Bayes Optimal Strategies for Membership Inference »
Alexandre Sablayrolles · Douze Matthijs · Cordelia Schmid · Yann Ollivier · Herve Jegou -
2019 Oral: Making Deep Q-learning methods robust to time discretization »
Corentin Tallec · Leonard Blier · Yann Ollivier -
2019 Oral: First-Order Adversarial Vulnerability of Neural Networks and Input Dimension »
Carl-Johann Simon-Gabriel · Yann Ollivier · Leon Bottou · Bernhard Schölkopf · David Lopez-Paz -
2018 Poster: Mixed batches and symmetric discriminators for GAN training »
Thomas LUCAS · Corentin Tallec · Yann Ollivier · Jakob Verbeek -
2018 Oral: Mixed batches and symmetric discriminators for GAN training »
Thomas LUCAS · Corentin Tallec · Yann Ollivier · Jakob Verbeek -
2018 Poster: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Poster: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Poster: Convergent Tree Backup and Retrace with Function Approximation »
Ahmed Touati · Pierre-Luc Bacon · Doina Precup · Pascal Vincent -
2018 Oral: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Oral: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Oral: Convergent Tree Backup and Retrace with Function Approximation »
Ahmed Touati · Pierre-Luc Bacon · Doina Precup · Pascal Vincent -
2018 Poster: An Inference-Based Policy Gradient Method for Learning Options »
Matthew Smith · Herke van Hoof · Joelle Pineau -
2018 Poster: Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs »
Andrea Zanette · Emma Brunskill -
2018 Oral: An Inference-Based Policy Gradient Method for Learning Options »
Matthew Smith · Herke van Hoof · Joelle Pineau -
2018 Oral: Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs »
Andrea Zanette · Emma Brunskill -
2017 Workshop: Reproducibility in Machine Learning Research »
Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio -
2017 : Lifelong Learning - Panel Discussion »
Sergey Levine · Joelle Pineau · Balaraman Ravindran · Andrei A Rusu -
2017 : Joelle Pineau: A few modest insights from my lifelong learning »
Joelle Pineau