Timezone: »
Many important real-world Reinforcement Learning (RL) problems involve partial observability and require policies with memory. Unfortunately, standard deep RL algorithms for partially observable settings typically condition on the full history of interactions and are notoriously difficult to train. We propose a novel deep, partially observable RL algorithm based on modelling belief states — a technique typically used when solving tabular POMDPs, but that has traditionally been difficult to apply to more complex environments. Our approach simplifies policy learning by leveraging state information at training time, that may not be available at deployment time. We do so in two ways: first, we decouple belief state modelling (via unsupervised learning) from policy optimization (via RL); and second, we propose a representation learning approach to capture a compact set of reward-relevant features of the state. Experiments demonstrate the efficacy of our approach on partially observable domains requiring information seeking and long-term memory.
Author Information
Andrew Wang (University of Toronto)
Andrew C Li (University of Toronto and Vector Institute)
Toryn Q Klassen (University of Toronto)
Rodrigo A Toro Icarte (University of Toronto and Vector Institute)
I am a PhD student in the knowledge representation group at the University of Toronto. I am also a member of the Canadian Artificial Intelligence Association and the Vector Institute. My supervisor is Sheila McIlraith. I did my undergrad in Computer Engineering and MSc in Computer Science at Pontificia Universidad Católica de Chile (PUC). My master's degree was co-supervised by Alvaro Soto and Jorge Baier. While I was at PUC, I instructed the undergraduate course "Introduction to Programming Languages."
Sheila McIlraith (University of Toronto and Vector Institute)
Sheila McIlraith is a Professor in the Department of Computer Science at the University of Toronto, a Canada CIFAR AI Chair (Vector Institute), and a Research Lead at the Schwartz Reisman Institute for Technology and Society. McIlraith's research is in the area of AI sequential decision making broadly construed, with a focus on human-compatible AI. McIlraith is a Fellow of the ACM and AAAI.
More from the Same Authors
-
2021 : AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning »
Maayan Shvo · Zhiming Hu · Rodrigo A Toro Icarte · Iqbal Mohomed · Allan Jepson · Sheila McIlraith -
2022 : Exploring Long-Horizon Reasoning with Deep RL in Combinatorially Hard Tasks »
Andrew C Li · Pashootan Vaezipoor · Rodrigo A Toro Icarte · Sheila McIlraith -
2022 : You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2021 Poster: LTL2Action: Generalizing LTL Instructions for Multi-Task RL »
Pashootan Vaezipoor · Andrew C Li · Rodrigo A Toro Icarte · Sheila McIlraith -
2021 Spotlight: LTL2Action: Generalizing LTL Instructions for Multi-Task RL »
Pashootan Vaezipoor · Andrew C Li · Rodrigo A Toro Icarte · Sheila McIlraith -
2018 Poster: Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning »
Rodrigo A Toro Icarte · Toryn Q Klassen · Richard Valenzano · Sheila McIlraith -
2018 Oral: Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning »
Rodrigo A Toro Icarte · Toryn Q Klassen · Richard Valenzano · Sheila McIlraith