Timezone: »
Poster
Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies
Ilyas Fatkhullin · Anas Barakat · Anastasia Kireeva · Niao He
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations. Despite the huge efforts directed at the design of efficient stochastic PG-type algorithms, the understanding of their convergence to a globally optimal policy is still limited. In this work, we develop improved global convergence guarantees for a general class of Fisher-non-degenerate parameterized policies which allows to address the case of continuous state action spaces. First, we propose a Normalized Policy Gradient method with Implicit Gradient Transport (N-PG-IGT) and derive a $\tilde{\mathcal{O}}(\varepsilon^{-2.5})$ sample complexity of this method for finding a global $\varepsilon$-optimal policy. Improving over the previously known $\tilde{\mathcal{O}}(\varepsilon^{-3})$ complexity, this algorithm does not require the use of importance sampling or second-order information and samples only one trajectory per iteration. Second, we further improve this complexity to $\tilde{ \mathcal{\mathcal{O}} }(\varepsilon^{-2})$ by considering a Hessian-Aided Recursive Policy Gradient ((N)-HARPG) algorithm enhanced with a correction based on a Hessian-vector product. Interestingly, both algorithms are $(i)$ simple and easy to implement: single-loop, do not require large batches of trajectories and sample at most two trajectories per iteration; $(ii)$ computationally and memory efficient: they do not require expensive subroutines at each iteration and can be implemented with memory linear in the dimension of parameters.
Author Information
Ilyas Fatkhullin (ETH Zurich)
Anas Barakat (ETH Zurich)
Anastasia Kireeva (ETHZ - ETH Zurich)
Niao He (ETH Zurich)
More from the Same Authors
-
2021 : Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation »
Semih Cayci · Siddhartha Satpathi · Niao He · R Srikant -
2021 : EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback »
Peter Richtarik · Peter Richtarik · Ilyas Fatkhullin -
2021 : Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation »
Semih Cayci · Niao He · R Srikant -
2023 : Momentum Provably Improves Error Feedback! »
Ilyas Fatkhullin · Alexander Tyurin · Peter Richtarik -
2023 Poster: Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space »
Anas Barakat · Ilyas Fatkhullin · Niao He -
2023 Poster: Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games »
Batuhan Yardim · Semih Cayci · Matthieu Geist · Niao He -
2022 Poster: 3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation »
Peter Richtarik · Igor Sokolov · Elnur Gasanov · Ilyas Fatkhullin · Zhize Li · Eduard Gorbunov -
2022 Spotlight: 3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation »
Peter Richtarik · Igor Sokolov · Elnur Gasanov · Ilyas Fatkhullin · Zhize Li · Eduard Gorbunov -
2022 Poster: A Natural Actor-Critic Framework for Zero-Sum Markov Games »
Ahmet Alacaoglu · Luca Viano · Niao He · Volkan Cevher -
2022 Spotlight: A Natural Actor-Critic Framework for Zero-Sum Markov Games »
Ahmet Alacaoglu · Luca Viano · Niao He · Volkan Cevher -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang