Skip to yearly menu bar Skip to main content


Poster

Learning Stochastic Shortest Path with Linear Function Approximation

Yifei Min · Jiafan He · Tianhao Wang · Quanquan Gu

Hall E #805

Keywords: [ T: Reinforcement Learning and Planning ] [ Reinforcement Learning ]


Abstract: We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an O~(dB1.5K/cmin) regret. Here K is the number of episodes, d is the dimension of the feature mapping in the mixture model, B bounds the expected cumulative cost of the optimal policy, and cmin>0 is the lower bound of the cost function.Our algorithm also applies to the case when cmin=0, and an O~(K2/3) regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. Moreover, we design a refined Bernstein-type confidence set and propose an improved algorithm, which provably achieves an O~(dBK/cmin) regret.In complement to the regret upper bounds, we also prove a lower bound of Ω(dBK). Hence, our improved algorithm matches the lower bound up to a 1/cmin factor and poly-logarithmic factors, achieving a near-optimal regret guarantee.

Chat is not available.