Timezone: »
We introduce Recurrent Predictive State Policy(RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially ob-servable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward. The recursive filter leverages predictive state representations (PSRs) (Rosencrantz & Gordon, 2004; Sun et al., 2016) by modeling predictive state—a prediction of the distribution of future observations conditioned on history and future actions.This representation gives rise to a rich class of statistically consistent algorithms (Hefny et al.,2017) to initialize the recursive filter. Predictive stats serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying training while still allowing optimal behavior. Moreover, we use the PSR interpretation during training as well, by incorporating prediction error in the loss function. The entire network (recursive filter and reactive policy) is still differentiable and can be trained using gradient-based methods. We optimize our policy using a combination of policy gradient based on rewards (Williams, 1992)and gradient descent based on prediction error.We show the efficacy of RPSP-networks on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models, being the overall best performing method.
Author Information
Ahmed Hefny (Carnegie Mellon University)
Zita Marinho (Carnegie Mellon University)
Wen Sun (Carnegie Mellon University)
Siddhartha Srinivasa (University of Washington)
Geoff Gordon (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: Recurrent Predictive State Policy Networks »
Thu. Jul 12th 03:40 -- 03:50 PM Room A3
More from the Same Authors
-
2021 Poster: Decomposed Mutual Information Estimation for Contrastive Representation Learning »
Alessandro Sordoni · Nouha Dziri · Hannes Schulz · Geoff Gordon · Philip Bachman · Remi Tachet des Combes -
2021 Poster: Understanding and Mitigating Accuracy Disparity in Regression »
Jianfeng Chi · Yuan Tian · Geoff Gordon · Han Zhao -
2021 Spotlight: Understanding and Mitigating Accuracy Disparity in Regression »
Jianfeng Chi · Yuan Tian · Geoff Gordon · Han Zhao -
2021 Spotlight: Decomposed Mutual Information Estimation for Contrastive Representation Learning »
Alessandro Sordoni · Nouha Dziri · Hannes Schulz · Geoff Gordon · Philip Bachman · Remi Tachet des Combes -
2021 Poster: Information Obfuscation of Graph Neural Networks »
Peiyuan Liao · Han Zhao · Keyulu Xu · Tommi Jaakkola · Geoff Gordon · Stefanie Jegelka · Ruslan Salakhutdinov -
2021 Spotlight: Information Obfuscation of Graph Neural Networks »
Peiyuan Liao · Han Zhao · Keyulu Xu · Tommi Jaakkola · Geoff Gordon · Stefanie Jegelka · Ruslan Salakhutdinov -
2019 Poster: Iterative Linearized Control: Stable Algorithms and Complexity Guarantees »
Vincent Roulet · Dmitriy Drusvyatskiy · Siddhartha Srinivasa · Zaid Harchaoui -
2019 Oral: Iterative Linearized Control: Stable Algorithms and Complexity Guarantees »
Vincent Roulet · Dmitriy Drusvyatskiy · Siddhartha Srinivasa · Zaid Harchaoui -
2019 Poster: Provably Efficient Imitation Learning from Observation Alone »
Wen Sun · Anirudh Vemula · Byron Boots · Drew Bagnell -
2019 Oral: Provably Efficient Imitation Learning from Observation Alone »
Wen Sun · Anirudh Vemula · Byron Boots · Drew Bagnell -
2019 Poster: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2019 Poster: On Learning Invariant Representations for Domain Adaptation »
Han Zhao · Remi Tachet des Combes · Kun Zhang · Geoff Gordon -
2019 Oral: On Learning Invariant Representations for Domain Adaptation »
Han Zhao · Remi Tachet des Combes · Kun Zhang · Geoff Gordon -
2019 Oral: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2017 Poster: Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction »
Wen Sun · Arun Venkatraman · Geoff Gordon · Byron Boots · Drew Bagnell -
2017 Talk: Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction »
Wen Sun · Arun Venkatraman · Geoff Gordon · Byron Boots · Drew Bagnell -
2017 Poster: Safety-Aware Algorithms for Adversarial Contextual Bandit »
Wen Sun · Debadeepta Dey · Ashish Kapoor -
2017 Talk: Safety-Aware Algorithms for Adversarial Contextual Bandit »
Wen Sun · Debadeepta Dey · Ashish Kapoor