In this work we aim to solve a large collection oftasks using a single reinforcement learning agentwith a single set of parameters. A key challengeis to handle the increased amount of data and extendedtraining time. We have developed a newdistributed agent IMPALA (Importance WeightedActor-Learner Architecture) that not only usesresources more efficiently in single-machine trainingbut also scales to thousands of machines withoutsacrificing data efficiency or resource utilisation.We achieve stable learning at high throughputby combining decoupled acting and learningwith a novel off-policy correction method calledV-trace. We demonstrate the effectiveness of IMPALAfor multi-task reinforcement learning onDMLab-30 (a set of 30 tasks from the DeepMindLab environment (Beattie et al., 2016)) and Atari57(all available Atari games in Arcade LearningEnvironment (Bellemare et al., 2013a)). Our resultsshow that IMPALA is able to achieve betterperformance than previous agents with less data,and crucially exhibits positive transfer betweentasks as a result of its multi-task approach.
( events) Timezone: »
Fri Jul 13 12:50 AM -- 01:10 AM (PDT) @ A1
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures