Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Finding the Near Optimal Policy via Reductive Regularization in MDPs

Wenhao Yang · Xiang Li · Guangzeng Xie · Zhihua Zhang

Abstract: Regularized Markov Decision processes (MDPs) serve as a smooth version of ordinary MDPs to encourage exploration. Given a regularized MDP, however, the optimal policy is often biased when evaluating the value function. Rather than making the coefficient $\lambda$ of regularized term sufficiently small, we propose a scheme by reducing $\lambda$ to approximate the optimal policy of the original MDP. We prove that the iteration complexity to obtain an $\varepsilon$-optimal policy could be maintained or even reduced in comparison with setting a sufficiently small $\lambda$ in both dynamic programming and policy gradient methods. In addition, there exists a strong duality connection between the reduction method and solving the original MDP directly, from which we can derive more adaptive reduction methods for certain reinforcement learning algorithms.

Chat is not available.