ICML Nonstationary Reinforcement Learning with Linear Function Approximation

Poster
in
Workshop: Workshop on Reinforcement Learning Theory

Nonstationary Reinforcement Learning with Linear Function Approximation

Huozhi Zhou · Jinglin Chen · Lav Varshney · Ashish Jagmohan

[ Abstract ]

[ Visit Poster at Spot A4 in Virtual World ]

Abstract: We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment. Specifically, both the reward and state transition functions can evolve over time, constrained so their total variations do not exceed a given \textit{variation budget}. We first develop

LSVI-UCB-Restart

$\texttt{LSVI-UCB-Restart}$ algorithm, an optimistic modification of least-squares value iteration combined with periodic restart, and bound its dynamic regret when variation budgets are known. We then propose a parameter-free algorithm that works without knowing the variation budgets, \texttt{Ada-LSVI-UCB-Restart}, but with a slightly worse dynamic regret bound. We also derive the first minimax dynamic regret lower bound for nonstationary linear MDPs to show our proposed algorithms are near-optimal. As a byproduct, we establish a minimax regret lower bound for linear MDPs, which was unsolved by~\citet{jin2020provably}.

Chat is not available.

Poster in Workshop: Workshop on Reinforcement Learning Theory

Nonstationary Reinforcement Learning with Linear Function Approximation

Huozhi Zhou · Jinglin Chen · Lav Varshney · Ashish Jagmohan

Poster
in
Workshop: Workshop on Reinforcement Learning Theory