Timezone: »

Large Batch Experience Replay
Thibault Lahire · Matthieu Geist · Emmanuel Rachelson

Thu Jul 21 03:00 PM -- 05:00 PM (PDT) @ Hall E #921

Several algorithms have been proposed to sample non-uniformly the replay buffer of deep Reinforcement Learning (RL) agents to speed-up learning, but very few theoretical foundations of these sampling schemes have been provided. Among others, Prioritized Experience Replay appears as a hyperparameter sensitive heuristic, even though it can provide good performance. In this work, we cast the replay buffer sampling problem as an importance sampling one for estimating the gradient. This allows deriving the theoretically optimal sampling distribution, yielding the best theoretical convergence speed.Elaborating on the knowledge of the ideal sampling scheme, we exhibit new theoretical foundations of Prioritized Experience Replay. The optimal sampling distribution being intractable, we make several approximations providing good results in practice and introduce, among others, LaBER (Large Batch Experience Replay), an easy-to-code and efficient method for sampling the replay buffer. LaBER, which can be combined with Deep Q-Networks, distributional RL agents or actor-critic methods, yields improved performance over a diverse range of Atari games and PyBullet environments, compared to the base agent it is implemented on and to other prioritization schemes.

Author Information

Thibault Lahire (Université de Toulouse, ISAE-SUPAERO)
Matthieu Geist (Google)
Emmanuel Rachelson (ISAE-SUPAERO)

Dr. Emmanuel Rachelson is an associate professor in Machine Learning and Artificial Intelligence at ISAE-SUPAERO. He founded the Data and Decision Sciences Master-level curriculum, which he now oversees. He is also the leader of ISAE-SUPAERO Reinforcement Learning Initiative. He graduated from ISAE-SUPAERO and received a MS in Artificial Intelligence from University Paul Sabatier in 2005. He received a PhD in Artificial Intelligence from the University of Toulouse in 2009. His research focuses on robust sequential decision under uncertainty and he specializes in Reinforcement Learning, while keeping strong connections with the broader fields of Machine Learning and Operations Research. Among his current research interests are Robustness and Dependability in (Deep) Reinforcement Learning and Monte Carlo Tree Search. He has contributed to applications in energy management, UAV planning and control, robotics, satellite systems (radio resource management or imaging tasks), air traffic management, and aircraft design. One specific focus is on how Reinforcement Learning principles can be applied to control Optimization processes under resource constraints.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors