Timezone: »
In this paper we propose Reward Machines - a type of finite state machine that supports the specification of reward functions while exposing reward function structure to the learner and supporting decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete domains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous state space.
Author Information
Rodrigo A Toro Icarte (University of Toronto)
I am a PhD student in the knowledge representation group at the University of Toronto. I am also a member of the Canadian Artificial Intelligence Association and the Vector Institute. My supervisor is Sheila McIlraith. I did my undergrad in Computer Engineering and MSc in Computer Science at Pontificia Universidad Católica de Chile (PUC). My master's degree was co-supervised by Alvaro Soto and Jorge Baier. While I was at PUC, I instructed the undergraduate course "Introduction to Programming Languages."
Toryn Q Klassen (University of Toronto)
Richard Valenzano (Element AI)
Sheila McIlraith (University of Toronto)
Sheila McIlraith is a Professor in the Department of Computer Science at the University of Toronto, a Canada CIFAR AI Chair (Vector Institute), and a Research Lead at the Schwartz Reisman Institute for Technology and Society. McIlraith's research is in the area of AI sequential decision making broadly construed, with a focus on human-compatible AI. McIlraith is a Fellow of the ACM and AAAI.
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning »
Fri. Jul 13th 04:15 -- 07:00 PM Room Hall B #147
More from the Same Authors
-
2021 : AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning »
Maayan Shvo · Zhiming Hu · Rodrigo A Toro Icarte · Iqbal Mohomed · Allan Jepson · Sheila McIlraith -
2022 : Exploring Long-Horizon Reasoning with Deep RL in Combinatorially Hard Tasks »
Andrew C Li · Pashootan Vaezipoor · Rodrigo A Toro Icarte · Sheila McIlraith -
2022 : You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 Poster: Learning Belief Representations for Partially Observable Deep RL »
Andrew Wang · Andrew C Li · Toryn Q Klassen · Rodrigo A Toro Icarte · Sheila McIlraith -
2021 Poster: LTL2Action: Generalizing LTL Instructions for Multi-Task RL »
Pashootan Vaezipoor · Andrew C Li · Rodrigo A Toro Icarte · Sheila McIlraith -
2021 Spotlight: LTL2Action: Generalizing LTL Instructions for Multi-Task RL »
Pashootan Vaezipoor · Andrew C Li · Rodrigo A Toro Icarte · Sheila McIlraith