Timezone: »
Oral
PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos
Paavo Parmas · Carl E Rasmussen · Jan Peters · Kenji Doya
Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based reinforcement learning imply that the problem is not just a numerical issue, but it may be caused by a fundamental chaos-like nature of long chains of nonlinear computations. Not only do the magnitudes of the gradients become large, the direction of the gradients becomes essentially random. We show that reparameterization gradients suffer from the problem, while likelihood ratio gradients are robust. Using our insights, we develop a model-based policy search framework, Probabilistic Inference for Particle-Based Policy Search (PIPPS), which is easily extensible, and allows for almost arbitrary models and policies, while simultaneously matching the performance of previous data-efficient learning algorithms. Finally, we invent the total propagation algorithm, which efficiently computes a union over all pathwise derivative depths during a single backwards pass, automatically giving greater weight to estimators with lower variance, sometimes improving over reparameterization gradients by $10^6$ times.
Author Information
Paavo Parmas (Okinawa Institute of Science and Technology Graduate University)
Carl E Rasmussen (Cambridge University)
Jan Peters (TU Darmstadt + Max Planck Institute for Intelligent Systems)
Kenji Doya (Okinawa Institute of Science and Technology)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos »
Thu. Jul 12th 04:15 -- 07:00 PM Room Hall B #8
More from the Same Authors
-
2021 : Exploration via Empowerment Gain: Combining Novelty, Surprise and Learning Progress »
Philip Becker-Ehmck · Maximilian Karl · Jan Peters · Patrick van der Smagt -
2022 : Contrasting Discrete and Continuous Time Methods for Bayesian System Identification »
Talay Cheema · Carl E Rasmussen -
2023 : Parameterized projected Bellman operator »
Théo Vincent · Alberto Maria Metelli · Jan Peters · Marcello Restelli · Carlo D'Eramo -
2023 Poster: Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators »
Paavo Parmas · Takuma Seno · Yuma Aoki -
2022 Poster: Curriculum Reinforcement Learning via Constrained Optimal Transport »
Pascal Klink · Haoyi Yang · Carlo D'Eramo · Jan Peters · Joni Pajarinen -
2022 Spotlight: Curriculum Reinforcement Learning via Constrained Optimal Transport »
Pascal Klink · Haoyi Yang · Carlo D'Eramo · Jan Peters · Joni Pajarinen -
2021 : RL + Robotics Panel »
George Konidaris · Jan Peters · Martin Riedmiller · Angela Schoellig · Rose Yu · Rupam Mahmood -
2021 Poster: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Spotlight: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Poster: Convex Regularization in Monte-Carlo Tree Search »
Tuan Q Dam · Carlo D'Eramo · Jan Peters · Joni Pajarinen -
2021 Spotlight: Convex Regularization in Monte-Carlo Tree Search »
Tuan Q Dam · Carlo D'Eramo · Jan Peters · Joni Pajarinen -
2019 Poster: Projections for Approximate Policy Iteration Algorithms »
Riad Akrour · Joni Pajarinen · Jan Peters · Gerhard Neumann -
2019 Oral: Rates of Convergence for Sparse Variational Gaussian Process Regression »
David Burt · Carl E Rasmussen · Mark van der Wilk -
2019 Oral: Projections for Approximate Policy Iteration Algorithms »
Riad Akrour · Joni Pajarinen · Jan Peters · Gerhard Neumann -
2019 Poster: Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models »
Alessandro Davide Ialongo · Mark van der Wilk · James Hensman · Carl E Rasmussen -
2019 Oral: Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models »
Alessandro Davide Ialongo · Mark van der Wilk · James Hensman · Carl E Rasmussen -
2019 Poster: Rates of Convergence for Sparse Variational Gaussian Process Regression »
David Burt · Carl E Rasmussen · Mark van der Wilk