Poster
Learning Stochastic Shortest Path with Linear Function Approximation
Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu
Hall E #805
Keywords: [ T: Reinforcement Learning and Planning ] [ Reinforcement Learning ]
Abstract:
We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an regret. Here is the number of episodes, is the dimension of the feature mapping in the mixture model, bounds the expected cumulative cost of the optimal policy, and is the lower bound of the cost function.Our algorithm also applies to the case when , and an regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. Moreover, we design a refined Bernstein-type confidence set and propose an improved algorithm, which provably achieves an regret.In complement to the regret upper bounds, we also prove a lower bound of . Hence, our improved algorithm matches the lower bound up to a factor and poly-logarithmic factors, achieving a near-optimal regret guarantee.
Chat is not available.