Skip to yearly menu bar Skip to main content

Workshop: Workshop on Reinforcement Learning Theory

Nonstationary Reinforcement Learning with Linear Function Approximation

Huozhi Zhou · Jinglin Chen · Lav Varshney · Ashish Jagmohan

Abstract: We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment. Specifically, both the reward and state transition functions can evolve over time, constrained so their total variations do not exceed a given \textit{variation budget}. We first develop $\texttt{LSVI-UCB-Restart}$ algorithm, an optimistic modification of least-squares value iteration combined with periodic restart, and bound its dynamic regret when variation budgets are known. We then propose a parameter-free algorithm that works without knowing the variation budgets, \texttt{Ada-LSVI-UCB-Restart}, but with a slightly worse dynamic regret bound. We also derive the first minimax dynamic regret lower bound for nonstationary linear MDPs to show our proposed algorithms are near-optimal. As a byproduct, we establish a minimax regret lower bound for linear MDPs, which was unsolved by~\citet{jin2020provably}.

Chat is not available.