Timezone: »

Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation
Zaiwei Chen · sajad khodadadian · Siva Maguluri
In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of $\mathcal{O}(\epsilon^{-3})$, outperforming all the previously known convergence bounds of such algorithms. In order to overcome the divergence due to deadly triad in off-policy policy evaluation under function approximation, we develop a critic that employs $n$-step TD-learning algorithm with a properly chosen $n$. We derive our sample complexity bounds solely based on the assumption that the behavior policy sufficiently explores all the states and actions, which is a much lighter assumption compared to the related literature.

Author Information

Zaiwei Chen (Georgia Institute of Technology)
sajad khodadadian (georgia institute of technology)
Siva Maguluri (Georgia Tech)

More from the Same Authors