Timezone: »
Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. However, in many real-world applications, this assumption is violated, and using existing algorithms may result in a performance lag. To proactively search for a good future policy, we present a policy gradient algorithm that maximizes a forecast of future performance. This forecast is obtained by fitting a curve to the counter-factual estimates of policy performance over time, without explicitly modeling the underlying non-stationarity. The resulting algorithm amounts to a non-uniform reweighting of past data, and we observe that minimizing performance over some of the data from past episodes can be beneficial when searching for a policy that maximizes future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques, on three simulated problems motivated by real-world applications.
Author Information
Yash Chandak (University of Massachusetts Amherst)
Georgios Theocharous (Adobe Research)
Shiv Shankar (University of Massachusetts)
Martha White (University of Alberta)
Sridhar Mahadevan (Adobe Research)
Philip Thomas (University of Massachusetts Amherst)
More from the Same Authors
-
2023 : In-Context Decision-Making from Supervised Pretraining »
Jonathan Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 Poster: Understanding Self-Predictive Learning for Reinforcement Learning »
Yunhao Tang · Zhaohan Guo · Pierre Richemond · Bernardo Avila Pires · Yash Chandak · Remi Munos · Mark Rowland · Mohammad Gheshlaghi Azar · Charline Le Lan · Clare Lyle · Andras Gyorgy · Shantanu Thakoor · Will Dabney · Bilal Piot · Daniele Calandriello · Michal Valko -
2023 Poster: Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition »
Yash Chandak · Shantanu Thakoor · Zhaohan Guo · Yunhao Tang · Remi Munos · Will Dabney · Diana Borsa -
2023 Poster: Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning »
Brett Daley · Martha White · Christopher Amato · Marlos C. Machado -
2022 Poster: A Temporal-Difference Approach to Policy Gradient Estimation »
Samuele Tosatto · Andrew Patterson · Martha White · A. Mahmood -
2022 Spotlight: A Temporal-Difference Approach to Policy Gradient Estimation »
Samuele Tosatto · Andrew Patterson · Martha White · A. Mahmood -
2021 : RL + Recommender Systems Panel »
Alekh Agarwal · Ed Chi · Maria Dimakopoulou · Georgios Theocharous · Minmin Chen · Lihong Li -
2021 Spotlight: Towards Practical Mean Bounds for Small Samples »
My Phan · Philip Thomas · Erik Learned-Miller -
2021 Poster: Towards Practical Mean Bounds for Small Samples »
My Phan · Philip Thomas · Erik Learned-Miller -
2021 Poster: Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods »
Chris Nota · Philip Thomas · Bruno C. da Silva -
2021 Spotlight: Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods »
Chris Nota · Philip Thomas · Bruno C. da Silva -
2021 Poster: High Confidence Generalization for Reinforcement Learning »
James Kostas · Yash Chandak · Scott Jordan · Georgios Theocharous · Philip Thomas -
2021 Spotlight: High Confidence Generalization for Reinforcement Learning »
James Kostas · Yash Chandak · Scott Jordan · Georgios Theocharous · Philip Thomas -
2020 : Panel Discussion »
Eric Eaton · Martha White · Doina Precup · Irina Rish · Harm van Seijen -
2020 : QA for invited talk 5 White »
Martha White -
2020 : Invited talk 5 White »
Martha White -
2020 : An Off-policy Policy Gradient Theorem: A Tale About Weightings - Martha White »
Martha White -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2020 Poster: Gradient Temporal-Difference Learning with Regularized Corrections »
Sina Ghiassian · Andrew Patterson · Shivam Garg · Dhawal Gupta · Adam White · Martha White -
2020 Poster: Selective Dyna-style Planning Under Limited Model Capacity »
Zaheer Abbas · Samuel Sokota · Erin Talvitie · Martha White -
2020 Poster: Asynchronous Coagent Networks »
James Kostas · Chris Nota · Philip Thomas -
2020 Poster: Evaluating the Performance of Reinforcement Learning Algorithms »
Scott Jordan · Yash Chandak · Daniel Cohen · Mengxue Zhang · Philip Thomas -
2019 Workshop: Exploration in Reinforcement Learning Workshop »
Benjamin Eysenbach · Benjamin Eysenbach · Surya Bhupatiraju · Shixiang Gu · Harrison Edwards · Martha White · Pierre-Yves Oudeyer · Kenneth Stanley · Emma Brunskill -
2019 Poster: Concentration Inequalities for Conditional Value at Risk »
Philip Thomas · Erik Learned-Miller -
2019 Oral: Concentration Inequalities for Conditional Value at Risk »
Philip Thomas · Erik Learned-Miller -
2019 Poster: Learning Action Representations for Reinforcement Learning »
Yash Chandak · Georgios Theocharous · James Kostas · Scott Jordan · Philip Thomas -
2019 Oral: Learning Action Representations for Reinforcement Learning »
Yash Chandak · Georgios Theocharous · James Kostas · Scott Jordan · Philip Thomas -
2018 Poster: Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control »
Yangchen Pan · Amir-massoud Farahmand · Martha White · Saleh Nabi · Piyush Grover · Daniel Nikovski -
2018 Oral: Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control »
Yangchen Pan · Amir-massoud Farahmand · Martha White · Saleh Nabi · Piyush Grover · Daniel Nikovski -
2018 Poster: Improving Regression Performance with Distributional Losses »
Ehsan Imani · Martha White -
2018 Poster: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Oral: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Oral: Improving Regression Performance with Distributional Losses »
Ehsan Imani · Martha White