Timezone: »
This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6× greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https:// github.com/bigrl-team/gear.
Author Information
Hanjing Wang (Shanghai Jiao Tong University)
Man-Kit Sit (The University of Edinburgh)
Congjie He (Informatics Forum, University of Edinburgh)
Ying Wen (Shanghai Jiao Tong University)
Weinan Zhang (Shanghai Jiao Tong University)
Jun Wang (University College London)
Yaodong Yang (Huawei UK)
Luo Mai (University of Edinburgh, University of Edinburgh)
More from the Same Authors
-
2023 Poster: MANSA: Learning Fast and Slow in Multi-Agent Systems »
David Mguni · Haojun Chen · Taher Jafferjee · Jianhong Wang · Longfei Yue · Xidong Feng · Stephen Mcaleer · Feifei Tong · Jun Wang · Yaodong Yang -
2023 Poster: Regret-Minimizing Double Oracle for Extensive-Form Games »
Xiaohang Tang · Le Cong Dinh · Stephen Mcaleer · Yaodong Yang -
2023 Poster: A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems »
Oliver Slumbers · David Mguni · Stefano Blumberg · Stephen Mcaleer · Yaodong Yang · Jun Wang -
2023 Poster: Cooperative Open-ended Learning Framework for Zero-Shot Coordination »
Yang Li · Shao Zhang · Jichen Sun · Yali Du · Ying Wen · Xinbing Wang · Wei Pan -
2022 Poster: Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization »
Minghuan Liu · Zhengbang Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao · Yong Yu · Jun Wang -
2022 Spotlight: Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization »
Minghuan Liu · Zhengbang Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao · Yong Yu · Jun Wang -
2022 Poster: Greedy when Sure and Conservative when Uncertain about the Opponents »
Haobo Fu · Ye Tian · Hongxiang Yu · Weiming Liu · Shuang Wu · Jiechao Xiong · Ying Wen · Kai Li · Junliang Xing · Qiang Fu · Wei Yang -
2022 Spotlight: Greedy when Sure and Conservative when Uncertain about the Opponents »
Haobo Fu · Ye Tian · Hongxiang Yu · Weiming Liu · Shuang Wu · Jiechao Xiong · Ying Wen · Kai Li · Junliang Xing · Qiang Fu · Wei Yang -
2021 Poster: Modelling Behavioural Diversity for Learning in Open-Ended Games »
Nicolas Perez-Nieves · Yaodong Yang · Oliver Slumbers · David Mguni · Ying Wen · Jun Wang -
2021 Oral: Modelling Behavioural Diversity for Learning in Open-Ended Games »
Nicolas Perez-Nieves · Yaodong Yang · Oliver Slumbers · David Mguni · Ying Wen · Jun Wang -
2020 Poster: Multi-Agent Determinantal Q-Learning »
Yaodong Yang · Ying Wen · Jun Wang · Liheng Chen · Kun Shao · David Mguni · Weinan Zhang -
2020 Poster: Bidirectional Model-based Policy Optimization »
Hang Lai · Jian Shen · Weinan Zhang · Yong Yu -
2019 Poster: Lipschitz Generative Adversarial Nets »
Zhiming Zhou · Jiadong Liang · Yuxuan Song · Lantao Yu · Hongwei Wang · Weinan Zhang · Yong Yu · Zhihua Zhang -
2019 Oral: Lipschitz Generative Adversarial Nets »
Zhiming Zhou · Jiadong Liang · Yuxuan Song · Lantao Yu · Hongwei Wang · Weinan Zhang · Yong Yu · Zhihua Zhang -
2018 Poster: Path-Level Network Transformation for Efficient Architecture Search »
Han Cai · Jiacheng Yang · Weinan Zhang · Song Han · Yong Yu -
2018 Poster: Mean Field Multi-Agent Reinforcement Learning »
Yaodong Yang · Rui Luo · Minne Li · Ming Zhou · Weinan Zhang · Jun Wang -
2018 Oral: Mean Field Multi-Agent Reinforcement Learning »
Yaodong Yang · Rui Luo · Minne Li · Ming Zhou · Weinan Zhang · Jun Wang -
2018 Oral: Path-Level Network Transformation for Efficient Architecture Search »
Han Cai · Jiacheng Yang · Weinan Zhang · Song Han · Yong Yu