Skip to yearly menu bar Skip to main content


Oral

Per-Decision Option Discounting

Anna Harutyunyan · Peter Vrancx · Philippe Hamel · Ann Nowe · Doina Precup

Abstract:

In order to solve complex problems, an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modeled through options, offers the ability to reason at many time scales, but the horizon length is still determined by the single discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that allows the agent’s horizon to grow naturally as its actions become more complex and extended in time. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.

Chat is not available.