Skip to yearly menu bar Skip to main content


Oral

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

Liyu Chen · Rahul Jain · Haipeng Luo

Hall F

Abstract: We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al., 2021).Our first algorithm is computationally efficient and achieves a regret bound O(d3B2TK)O(d3B2TK), where d is the dimension of the feature space, B and T are upper bounds of the expected costs and hitting time of the optimal policy respectively, and K is the number of episodes.The same algorithm with a slight modification also achieves logarithmic regret of order O(d3B4c2min\rm gapminln5dBKcmin), where \rm gapmin is the minimum sub-optimality gap and cmin is the minimum cost over all state-action pairs.Our result is obtained by developing a simpler and improved analysis for the finite-horizon approximation of (Cohen et al., 2021) with a smaller approximation error, which might be of independent interest.On the other hand, using variance-aware confidence sets in a global optimization problem,our second algorithm is computationally inefficient but achieves the first horizon-free'' regret bound O(d3.5BK) with no polynomial dependency on T or 1/cmin,almost matching the Ω(dBK) lower bound from (Min et al., 2021).

Chat is not available.