Timezone: »
Decoupling Exploration and Exploitation in Reinforcement Learning
Lukas Schäfer · Filippos Christianos · Josiah Hanna · Stefano V. Albrecht
Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two exploration-focused environments with five types of intrinsic rewards. We show that DeRL can be more robust to scaling of intrinsic rewards and converge to the same evaluation returns than intrinsically motivated baselines in fewer interactions.
Author Information
Lukas Schäfer (University of Edinburgh)
Filippos Christianos (University of Edinburgh)
Josiah Hanna (UT Austin)
Stefano V. Albrecht (University of Edinburgh)
More from the Same Authors
-
2021 Poster: Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing »
Filippos Christianos · Georgios Papoudakis · Muhammad Arrasy Rahman · Stefano V. Albrecht -
2021 Poster: Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning »
Muhammad Arrasy Rahman · Niklas Hopner · Filippos Christianos · Stefano V. Albrecht -
2021 Spotlight: Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning »
Muhammad Arrasy Rahman · Niklas Hopner · Filippos Christianos · Stefano V. Albrecht -
2021 Spotlight: Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing »
Filippos Christianos · Georgios Papoudakis · Muhammad Arrasy Rahman · Stefano V. Albrecht -
2019 Poster: Importance Sampling Policy Evaluation with an Estimated Behavior Policy »
Josiah Hanna · Scott Niekum · Peter Stone -
2019 Oral: Importance Sampling Policy Evaluation with an Estimated Behavior Policy »
Josiah Hanna · Scott Niekum · Peter Stone