Timezone: »
We introduce learning and planning algorithms for average-reward MDPs, including 1) the first general proven-convergent off-policy model-free control algorithm without reference states, 2) the first proven-convergent off-policy model-free prediction algorithm, and 3) the first off-policy learning algorithm that converges to the actual value function rather than to the value function plus an offset. All of our algorithms are based on using the temporal-difference error rather than the conventional error when updating the estimate of the average reward. Our proof techniques are a slight generalization of those by Abounadi, Bertsekas, and Borkar (2001). In experiments with an Access-Control Queuing Task, we show some of the difficulties that can arise when using methods that rely on reference states and argue that our new algorithms are significantly easier to use.
Author Information
Yi Wan (University of Alberta)
Abhishek Naik (University of Alberta; Amii)
Richard Sutton (DeepMind / Univ Alberta)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Learning and Planning in Average-Reward Markov Decision Processes »
Wed. Jul 21st 01:30 -- 01:35 AM Room
More from the Same Authors
-
2022 Poster: Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods »
Yi Wan · Ali Rahimi-Kalahroudi · Janarthanan Rajendran · Ida Momennejad · Sarath Chandar · Harm van Seijen -
2022 Spotlight: Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods »
Yi Wan · Ali Rahimi-Kalahroudi · Janarthanan Rajendran · Ida Momennejad · Sarath Chandar · Harm van Seijen -
2022 Social: Designing an RL system toward AGI »
Yi Wan · Alex Ayoub -
2021 Poster: Average-Reward Off-Policy Policy Evaluation with Function Approximation »
Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson -
2021 Spotlight: Average-Reward Off-Policy Policy Evaluation with Function Approximation »
Shangtong Zhang · Yi Wan · Richard Sutton · Shimon Whiteson -
2021 Social: RL Social »
Dibya Ghosh · Hager Radi · Derek Li · Alex Ayoub · Erfan Miahi · Rishabh Agarwal · Charline Le Lan · Abhishek Naik · John D. Martin · Shruti Mishra · Adrien Ali Taiga -
2021 Social: Continuing (Non-episodic) RL Problems »
Yi Wan -
2020 : Q&A by Rich Sutton »
Richard Sutton · Shagun Sodhani · Sarath Chandar -
2020 : The FOAK Cycle for Model-based Life-long Learning by Rich Sutton »
Richard Sutton