Timezone: »
Natural policy gradient (NPG) methods with function approximation achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, theoretical understanding of their convergence behaviors remains limited in the function approximation setting. In this paper, we perform a finite-time analysis of NPG with linear function approximation and softmax parameterization, and prove for the first time that widely used entropy regularization method, which encourages exploration, leads to linear convergence rate. We adopt a Lyapunov drift analysis to prove the convergence results and explain the effectiveness of entropy regularization in improving the convergence rates.
Author Information
Semih Cayci (University of Illinois at Urbana-Champaign)
Niao He (ETH Zurich)
R Srikant (UIUC)
More from the Same Authors
-
2021 : Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation »
Semih Cayci · Siddhartha Satpathi · Niao He · R Srikant -
2023 Poster: Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space »
Anas Barakat · Ilyas Fatkhullin · Niao He -
2023 Poster: Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies »
Ilyas Fatkhullin · Anas Barakat · Anastasia Kireeva · Niao He -
2023 Poster: Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games »
Batuhan Yardim · Semih Cayci · Matthieu Geist · Niao He -
2023 Poster: Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits »
Ronshee Chawla · Daniel Vial · Sanjay Shakkottai · R Srikant -
2022 Poster: Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation »
Daniel Vial · Advait Parulekar · Sanjay Shakkottai · R Srikant -
2022 Poster: A Natural Actor-Critic Framework for Zero-Sum Markov Games »
Ahmet Alacaoglu · Luca Viano · Niao He · Volkan Cevher -
2022 Spotlight: A Natural Actor-Critic Framework for Zero-Sum Markov Games »
Ahmet Alacaoglu · Luca Viano · Niao He · Volkan Cevher -
2022 Spotlight: Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation »
Daniel Vial · Advait Parulekar · Sanjay Shakkottai · R Srikant -
2021 Workshop: Workshop on Reinforcement Learning Theory »
Shipra Agrawal · Simon Du · Niao He · Csaba Szepesvari · Lin Yang -
2018 Poster: Understanding the Loss Surface of Neural Networks for Binary Classification »
SHIYU LIANG · Ruoyu Sun · Yixuan Li · R Srikant -
2018 Oral: Understanding the Loss Surface of Neural Networks for Binary Classification »
SHIYU LIANG · Ruoyu Sun · Yixuan Li · R Srikant