Timezone: »

 
Decoupling Exploration and Exploitation in Reinforcement Learning
Lukas Schäfer · Filippos Christianos · Josiah Hanna · Stefano V. Albrecht

Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two exploration-focused environments with five types of intrinsic rewards. We show that DeRL can be more robust to scaling of intrinsic rewards and converge to the same evaluation returns than intrinsically motivated baselines in fewer interactions.

Author Information

Lukas Schäfer (University of Edinburgh)
Filippos Christianos (University of Edinburgh)
Josiah Hanna (UT Austin)
Stefano V. Albrecht (University of Edinburgh)

More from the Same Authors