Per-Decision Option Discounting
Anna Harutyunyan · Peter Vrancx · Philippe Hamel · Ann Nowe · Doina Precup

Tue Jun 11th 04:30 -- 04:35 PM @ Room 104

In order to solve complex problems, an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modeled through options, offers the ability to reason at many time scales, but the horizon length is still determined by the single discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that allows the agent’s horizon to grow naturally as its actions become more complex and extended in time. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.

Author Information

Anna Harutyunyan (DeepMind)
Peter Vrancx (PROWLER.io)
Philippe Hamel (Deepmind)
Ann Nowe (VU Brussel)
Doina Precup (DeepMind)

Related Events (a corresponding poster, oral, or spotlight)