Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Decision Awareness in Reinforcement Learning

Hyperbolically Discounted Advantage Estimation for Generalization in Reinforcement Learning

Nasik Muhammad Nafi · Raja Farrukh Ali · William Hsu


Abstract:

In reinforcement learning (RL), agents typically discount future rewards using an exponential scheme. However, studies have shown that humans and animals instead exhibit hyperbolic time-preferences and thus discount future rewards hyperbolically. In the quest for RL agents that generalize well to previously unseen scenarios, we study the effects of hyperbolic discounting on generalization tasks and present Hyperbolic Discounting for Generalization in Reinforcement Learning (HDGenRL). We propose a hyperbolic discounting-based advantage estimation method that makes the agent aware of and robust to the underlying uncertainty of survival and episode duration. On the challenging RL generalization benchmark Procgen, our proposed approach achieves up to 200\% performance improvement over the PPO baseline that uses classical exponential discounting. We also incorporate hyperbolic discounting into another generalization-specific approach (APDAC), and the results indicate further improvement in APDAC's generalization ability. This denotes the effectiveness of our approach as a plug-in to any existing methods in aiding generalization.

Chat is not available.