Timezone: »

 
Poster
Taylor Expansion of Discount Factors
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko

Tue Jul 20 09:00 PM -- 11:00 PM (PDT) @ Virtual #None

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.

Author Information

Yunhao Tang (Columbia University)
Mark Rowland (DeepMind)
Remi Munos (DeepMind)
Michal Valko (DeepMind / Inria / ENS Paris-Saclay)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors