Timezone: »
Monte-Carlo planning and Reinforcement Learning (RL) are essential to sequential decision making. The recent AlphaGo and AlphaZero algorithms have shown how to successfully combine these two paradigms to solve large-scale sequential decision problems. These methodologies exploit a variant of the well-known UCT algorithm to trade off the exploitation of good actions and the exploration of unvisited states, but their empirical success comes at the cost of poor sample-efficiency and high computation time. In this paper, we overcome these limitations by introducing the use of convex regularization in Monte-Carlo Tree Search (MCTS) to drive exploration efficiently and to improve policy updates. First, we introduce a unifying theory on the use of generic convex regularizers in MCTS, deriving the first regret analysis of regularized MCTS and showing that it guarantees an exponential convergence rate. Second, we exploit our theoretical framework to introduce novel regularized backup operators for MCTS, based on the relative entropy of the policy update and, more importantly, on the Tsallis entropy of the policy, for which we prove superior theoretical guarantees. We empirically verify the consequence of our theoretical results on a toy problem. Finally, we show how our framework can easily be incorporated in AlphaGo and we empirically show the superiority of convex regularization, w.r.t. representative baselines, on well-known RL problems across several Atari games.
Author Information
Tuan Q Dam (TU Darmstadt)
Carlo D'Eramo (TU Darmstadt)
Jan Peters (TU Darmstadt)
Joni Pajarinen (Aalto University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Convex Regularization in Monte-Carlo Tree Search »
Wed. Jul 21st 04:00 -- 06:00 AM Room
More from the Same Authors
-
2021 : Topological Experience Replay for Fast Q-Learning »
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal -
2021 : Exploration via Empowerment Gain: Combining Novelty, Surprise and Learning Progress »
Philip Becker-Ehmck · Maximilian Karl · Jan Peters · Patrick van der Smagt -
2021 : Topological Experience Replay for Fast Q-Learning »
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal -
2023 : Parameterized projected Bellman operator »
Théo Vincent · Alberto Maria Metelli · Jan Peters · Marcello Restelli · Carlo D'Eramo -
2023 : Sparse Function-space Representation of Neural Networks »
Aidan Scannell · Riccardo Mereu · Paul Chang · Ella Tamir · Joni Pajarinen · Arno Solin -
2023 Poster: Simplified Temporal Consistency Reinforcement Learning »
Yi Zhao · Wenshuai Zhao · Rinu Boney · Kannala Juho · Joni Pajarinen -
2023 Poster: Hierarchical Imitation Learning with Vector Quantized Models »
Kalle Kujanpää · Joni Pajarinen · Alexander Ilin -
2022 Poster: Curriculum Reinforcement Learning via Constrained Optimal Transport »
Pascal Klink · Haoyi Yang · Carlo D'Eramo · Jan Peters · Joni Pajarinen -
2022 Spotlight: Curriculum Reinforcement Learning via Constrained Optimal Transport »
Pascal Klink · Haoyi Yang · Carlo D'Eramo · Jan Peters · Joni Pajarinen -
2021 : RL + Robotics Panel »
George Konidaris · Jan Peters · Martin Riedmiller · Angela Schoellig · Rose Yu · Rupam Mahmood -
2021 Poster: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Spotlight: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2019 Poster: Projections for Approximate Policy Iteration Algorithms »
Riad Akrour · Joni Pajarinen · Jan Peters · Gerhard Neumann -
2019 Oral: Projections for Approximate Policy Iteration Algorithms »
Riad Akrour · Joni Pajarinen · Jan Peters · Gerhard Neumann -
2018 Poster: PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos »
Paavo Parmas · Carl E Rasmussen · Jan Peters · Kenji Doya -
2018 Oral: PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos »
Paavo Parmas · Carl E Rasmussen · Jan Peters · Kenji Doya