Timezone: »
Poster
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Brett Daley · Martha White · Christopher Amato · Marlos C. Machado
Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by the instantaneous Importance Sampling (IS) ratio after each action via eligibility traces. Many off-policy algorithms rely on this mechanism, along with differing protocols for cutting the IS ratios (traces) to combat the variance of the IS estimator. Unfortunately, once a trace has been cut, the effect cannot be easily reversed. This has led to the development of credit-assignment strategies that account for multiple past experiences at a time. These trajectory-aware methods have not been extensively analyzed, and their theoretical justification remains uncertain. In this paper, we propose a multistep operator that unifies per-decision and trajectory-aware methods. We prove convergence conditions for our operator in the tabular setting, establishing the first guarantees for several existing methods as well as many new ones. Finally, we introduce Recency-Bounded Importance Sampling (RBIS), which leverages trajectory awareness to perform robustly across $\lambda$-values in an off-policy control task.
Author Information
Brett Daley (University of Alberta)
Martha White (University of Alberta)
Christopher Amato (Northeastern University)
Marlos C. Machado
More from the Same Authors
-
2023 Poster: Deep Laplacian-based Options for Temporally-Extended Exploration »
Martin Klissarov · Marlos C. Machado -
2022 Poster: A Temporal-Difference Approach to Policy Gradient Estimation »
Samuele Tosatto · Andrew Patterson · Martha White · A. Mahmood -
2022 Spotlight: A Temporal-Difference Approach to Policy Gradient Estimation »
Samuele Tosatto · Andrew Patterson · Martha White · A. Mahmood -
2020 : Panel Discussion »
Eric Eaton · Martha White · Doina Precup · Irina Rish · Harm van Seijen -
2020 : QA for invited talk 5 White »
Martha White -
2020 : Invited talk 5 White »
Martha White -
2020 : An Off-policy Policy Gradient Theorem: A Tale About Weightings - Martha White »
Martha White -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2020 Poster: Gradient Temporal-Difference Learning with Regularized Corrections »
Sina Ghiassian · Andrew Patterson · Shivam Garg · Dhawal Gupta · Adam White · Martha White -
2020 Poster: Selective Dyna-style Planning Under Limited Model Capacity »
Zaheer Abbas · Samuel Sokota · Erin Talvitie · Martha White -
2020 Poster: Optimizing for the Future in Non-Stationary MDPs »
Yash Chandak · Georgios Theocharous · Shiv Shankar · Martha White · Sridhar Mahadevan · Philip Thomas -
2019 Workshop: Exploration in Reinforcement Learning Workshop »
Benjamin Eysenbach · Benjamin Eysenbach · Surya Bhupatiraju · Shixiang Gu · Harrison Edwards · Martha White · Pierre-Yves Oudeyer · Kenneth Stanley · Emma Brunskill -
2018 Poster: Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control »
Yangchen Pan · Amir-massoud Farahmand · Martha White · Saleh Nabi · Piyush Grover · Daniel Nikovski -
2018 Oral: Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control »
Yangchen Pan · Amir-massoud Farahmand · Martha White · Saleh Nabi · Piyush Grover · Daniel Nikovski -
2018 Poster: Improving Regression Performance with Distributional Losses »
Ehsan Imani · Martha White -
2018 Oral: Improving Regression Performance with Distributional Losses »
Ehsan Imani · Martha White