Poster
in
Workshop: New Frontiers in Learning, Control, and Dynamical Systems
Sample Complexity of Hierarchical Decompositions in Markov Decision Processes
Arnaud Robert · Ciara Pike-Burke · Aldo Faisal
Hierarchical Reinforcement Learning (HRL) algorithms perform planning at multiple levels of abstraction. Algorithms that leverage states or temporal abstractions have empirically demonstrated a gain in sample efficiency. Yet, the basis of those efficiency gains is not fully understood and we still lack theoretically-grounded design rules to implement HRL algorithms. Here, we derive a lower bound on the sample complexity for the proposed class of goal-conditioned HRL algorithms (such as Dot-2-Dot \cite{beyret2019dot}) that inspires a novel Q-learning algorithm and highlights the relationship between the properties of the decomposition and the sample complexity. Specifically, the proposed lower bound on the sample complexity of such HRL algorithms allows to quantify the benefits of hierarchical decomposition. These theoretical findings guide the formulation of a simple Q-learning-type algorithm that leverages goal hierarchical decomposition. We then empirically validate our lower bound by investigating the sample complexity of the proposed hierarchical algorithm on a spectrum of tasks. Our tasks were designed to allow us to dial up or down their complexity over multiple orders of magnitude. Our theoretical and algorithmic results provide a clear step towards understanding the foundational question of quantifying the efficiency gains induced by hierarchies in reinforcement learning.