Poster
in
Workshop: Workshop on Reinforcement Learning Theory
Nonstationary Reinforcement Learning with Linear Function Approximation
Huozhi Zhou · Jinglin Chen · Lav Varshney · Ashish Jagmohan
Abstract:
We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment. Specifically, both the reward and state transition functions can evolve over time, constrained so their total variations do not exceed a given \textit{variation budget}. We first develop $\texttt{LSVI-UCB-Restart}$ algorithm, an optimistic modification of least-squares value iteration combined with periodic restart, and bound its dynamic regret when variation budgets are known. We then propose a parameter-free algorithm that works without knowing the variation budgets, \texttt{Ada-LSVI-UCB-Restart}, but with a slightly worse dynamic regret bound. We also derive the first minimax dynamic regret lower bound for nonstationary linear MDPs to show our proposed algorithms are near-optimal. As a byproduct, we establish a minimax regret lower bound for linear MDPs, which was unsolved by~\citet{jin2020provably}.
Chat is not available.