Poster
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette · Emma Brunskill
Pacific Ballroom #115
Keywords: [ Theory and Algorithms ]
Strong worst-case performance bounds for episodic reinforcement learning exist
but fortunately in practice RL algorithms perform much better than
such bounds would predict. Algorithms and theory that provide strong
problem-dependent bounds could help illuminate the key features of what
makes a RL problem hard and reduce the barrier to using RL algorithms
in practice. As a step towards this
we derive an algorithm and analysis for finite horizon discrete MDPs
with state-of-the-art worst-case regret bounds and substantially tighter bounds if the RL
environment has special features but without apriori
knowledge of the environment from the algorithm. As a result of our analysis,
we also help address an open learning theory question~\cite{jiang2018open}
about episodic MDPs with a constant upper-bound on the sum of rewards,
providing a regret bound function of the number of episodes with no
dependence on the horizon.
Live content is unavailable. Log in and register to view live content