Timezone: »
Policy gradient is one of the state-of-the-art algorithm families in reinforcement learning, which has been proved to be globally convergent. Motivated by properties of the accumulated reward in MDP, we propose a non-uniform refinement of the smoothness (NS) and \L{}ojasiewicz condition (N\L{}). The new definitions inspire new geometry-aware first-order policy gradient that are able to converge to global optimality in linear rate while incurring less overhead than existing algorithms, e.g., natural/mirror policy gradient. Similarly, For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate in fitting generalized linear models. Experimental results are used to illustrate and complement the theoretical findings.
Author Information
Bo Dai (Google Brain)
More from the Same Authors
-
2022 Poster: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Poster: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Spotlight: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Spotlight: Model Selection in Batch Policy Optimization »
Jonathan Lee · George Tucker · Ofir Nachum · Bo Dai -
2022 Poster: Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization »
Hanjun Dai · Mengjiao Yang · Yuan Xue · Dale Schuurmans · Bo Dai -
2022 Spotlight: Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization »
Hanjun Dai · Mengjiao Yang · Yuan Xue · Dale Schuurmans · Bo Dai