Timezone: »

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette · Emma Brunskill

Tue Jun 11 06:30 PM -- 09:00 PM (PDT) @ Pacific Ballroom #115

Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict. Algorithms and theory that provide strong problem-dependent bounds could help illuminate the key features of what makes a RL problem hard and reduce the barrier to using RL algorithms in practice. As a step towards this we derive an algorithm and analysis for finite horizon discrete MDPs
with state-of-the-art worst-case regret bounds and substantially tighter bounds if the RL environment has special features but without apriori knowledge of the environment from the algorithm. As a result of our analysis, we also help address an open learning theory question~\cite{jiang2018open} about episodic MDPs with a constant upper-bound on the sum of rewards, providing a regret bound function of the number of episodes with no dependence on the horizon.

Author Information

Andrea Zanette (Stanford University)
Emma Brunskill (Stanford University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors