ICML Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Poster
in
Workshop: Workshop on Reinforcement Learning Theory

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Zaiwei Chen · Siva Maguluri · Sanjay Shakkottai · Karthikeyan Shanmugam

[ Abstract ]

[ Visit Poster at Spot A6 in Virtual World ]

Abstract: Policy evaluation (including multi-step off-policy importance sampling) has the interpretation of solving a generalized Bellman equation. In this paper, we derive finite-sample bounds for any general off-policy TD-like stochastic approximation algorithm that solves for the fixed point of this generalized Bellman operator. Our key step is to show that the generalized Bellman operator is simultaneously a contraction mapping with respect to a weighted

ℓ_{p}

$\ell_p$ -norm for each

p

$p$ in

[1, \infty)

$[1,\infty)$ , with a common contraction factor. Our results immediately imply finite-sample bounds of variants of off-policy TD-learning algorithms in the literature (e.g.

Q^{π} (λ)

$Q^\pi(\lambda)$ , Tree-Backup, Retrace, and

Q

$Q$ -trace).

Chat is not available.

Poster in Workshop: Workshop on Reinforcement Learning Theory

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Zaiwei Chen · Siva Maguluri · Sanjay Shakkottai · Karthikeyan Shanmugam

Poster
in
Workshop: Workshop on Reinforcement Learning Theory