Timezone: »
In real-world multi-agent systems, agents with different capabilities may join or leave without altering the team's overarching goals. Coordinating teams with such dynamic composition is challenging: the optimal team strategy varies with the composition. We propose COPA, a coach-player framework to tackle this problem. We assume the coach has a global view of the environment and coordinates the players, who only have partial views, by distributing individual strategies. Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players. We validate our methods on a resource collection task, a rescue game, and the StarCraft micromanagement tasks. We demonstrate zero-shot generalization to new team compositions. Our method achieves comparable or better performance than the setting where all players have a full view of the environment. Moreover, we see that the performance remains high even when the coach communicates as little as 13% of the time using the adaptive communication strategy.
Author Information
Bo Liu (University of Texas, Austin)
Qiang Liu (UT Austin)
Peter Stone (The University of Texas at Austin)
Animesh Garg (University of Toronto, Vector Institute, Nvidia)
Yuke Zhu (University of Texas - Austin)
Anima Anandkumar (California Institute of Technology)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: Coach-Player Multi-agent Reinforcement Learning for Dynamic Team Composition »
Tue. Jul 20th 02:00 -- 02:20 PM Room
More from the Same Authors
-
2021 : Auditing AI models for Verified Deployment under Semantic Specifications »
Homanga Bharadhwaj · De-An Huang · Chaowei Xiao · Anima Anandkumar · Animesh Garg -
2021 : Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 : Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings »
Shunshi Zhang · Murat Erdogdu · Animesh Garg -
2021 : Learning by Watching: Physical Imitation of Manipulation Skills from Human Videos »
Haoyu Xiong · Yun-Chun Chen · Homanga Bharadhwaj · Samrath Sinha · Animesh Garg -
2022 : VIPer: Iterative Value-Aware Model Learning on the Value Improvement Path »
Romina Abachi · Claas Voelcker · Animesh Garg · Amir-massoud Farahmand -
2022 : Model-Based Meta Automatic Curriculum Learning »
Zifan Xu · Yulin Zhang · Shahaf Shperberg · Reuth Mirsky · Yuqian Jiang · Bo Liu · Peter Stone -
2022 : MoCoDA: Model-based Counterfactual Data Augmentation »
Silviu Pitis · Elliot Creager · Ajay Mandlekar · Animesh Garg -
2022 : Task Factorization in Curriculum Learning »
Reuth Mirsky · Shahaf Shperberg · Yulin Zhang · Zifan Xu · Yuqian Jiang · Jiaxun Cui · Peter Stone -
2023 Poster: VIMA: General Robot Manipulation with Multimodal Prompts »
Yunfan Jiang · Agrim Gupta · Zichen Zhang · Guanzhi Wang · Yongqiang Dou · Yanjun Chen · Li Fei-Fei · Anima Anandkumar · Yuke Zhu · Jim Fan -
2022 : Q/A: Invited Speaker: Peter Stone »
Peter Stone -
2022 : Invited Speaker: Peter Stone »
Peter Stone -
2022 Poster: Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics »
Matthias Weissenbacher · Samrath Sinha · Animesh Garg · Yoshinobu Kawahara -
2022 Spotlight: Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics »
Matthias Weissenbacher · Samrath Sinha · Animesh Garg · Yoshinobu Kawahara -
2022 Poster: Centroid Approximation for Bootstrap: Improving Particle Quality at Inference »
Mao Ye · Qiang Liu -
2022 Poster: Causal Dynamics Learning for Task-Independent State Abstraction »
Zizhao Wang · Xuesu Xiao · Zifan Xu · Yuke Zhu · Peter Stone -
2022 Poster: How to Fill the Optimum Set? Population Gradient Descent with Harmless Diversity »
Chengyue Gong · · Qiang Liu -
2022 Oral: Causal Dynamics Learning for Task-Independent State Abstraction »
Zizhao Wang · Xuesu Xiao · Zifan Xu · Yuke Zhu · Peter Stone -
2022 Spotlight: How to Fill the Optimum Set? Population Gradient Descent with Harmless Diversity »
Chengyue Gong · · Qiang Liu -
2022 Spotlight: Centroid Approximation for Bootstrap: Improving Particle Quality at Inference »
Mao Ye · Qiang Liu -
2022 Poster: A Langevin-like Sampler for Discrete Distributions »
Ruqi Zhang · Xingchao Liu · Qiang Liu -
2022 Spotlight: A Langevin-like Sampler for Discrete Distributions »
Ruqi Zhang · Xingchao Liu · Qiang Liu -
2021 Poster: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Poster: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Spotlight: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Spotlight: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 Poster: AlphaNet: Improved Training of Supernets with Alpha-Divergence »
Dilin Wang · Chengyue Gong · Meng Li · Qiang Liu · Vikas Chandra -
2021 Poster: SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies »
Jim Fan · Guanzhi Wang · De-An Huang · Zhiding Yu · Li Fei-Fei · Yuke Zhu · Anima Anandkumar -
2021 Oral: AlphaNet: Improved Training of Supernets with Alpha-Divergence »
Dilin Wang · Chengyue Gong · Meng Li · Qiang Liu · Vikas Chandra -
2021 Spotlight: SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies »
Jim Fan · Guanzhi Wang · De-An Huang · Zhiding Yu · Li Fei-Fei · Yuke Zhu · Anima Anandkumar -
2021 Poster: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Spotlight: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2020 Poster: Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection »
Mao Ye · Chengyue Gong · Lizhen Nie · Denny Zhou · Adam Klivans · Qiang Liu -
2020 Poster: Semi-Supervised StyleGAN for Disentanglement Learning »
Weili Nie · Tero Karras · Animesh Garg · Shoubhik Debnath · Anjul Patney · Ankit Patel · Anima Anandkumar -
2020 Poster: Go Wide, Then Narrow: Efficient Training of Deep Thin Networks »
Denny Zhou · Mao Ye · Chen Chen · Tianjian Meng · Mingxing Tan · Xiaodan Song · Quoc Le · Qiang Liu · Dale Schuurmans -
2020 Poster: Reducing Sampling Error in Batch Temporal Difference Learning »
Brahma Pavse · Ishan Durugkar · Josiah Hanna · Peter Stone -
2020 Poster: Accountable Off-Policy Evaluation With Kernel Bellman Statistics »
Yihao Feng · Tongzheng Ren · Ziyang Tang · Qiang Liu -
2020 Poster: A Chance-Constrained Generative Framework for Sequence Optimization »
Xianggen Liu · Qiang Liu · Sen Song · Jian Peng -
2020 Poster: Angular Visual Hardness »
Beidi Chen · Weiyang Liu · Zhiding Yu · Jan Kautz · Anshumali Shrivastava · Animesh Garg · Anima Anandkumar -
2019 : Peter Stone: Learning Curricula for Transfer Learning in RL »
Peter Stone -
2019 Workshop: Stein’s Method for Machine Learning and Statistics »
Francois-Xavier Briol · Lester Mackey · Chris Oates · Qiang Liu · Larry Goldstein · Larry Goldstein -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 : Invited Talk 1: Adaptive Tolling for Multiagent Traffic Optimization »
Peter Stone -
2019 Poster: Improving Neural Language Modeling via Adversarial Training »
Dilin Wang · Chengyue Gong · Qiang Liu -
2019 Oral: Improving Neural Language Modeling via Adversarial Training »
Dilin Wang · Chengyue Gong · Qiang Liu -
2019 Poster: Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization »
Chengyue Gong · Jian Peng · Qiang Liu -
2019 Poster: Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models »
Dilin Wang · Qiang Liu -
2019 Oral: Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization »
Chengyue Gong · Jian Peng · Qiang Liu -
2019 Oral: Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models »
Dilin Wang · Qiang Liu -
2019 Poster: Importance Sampling Policy Evaluation with an Estimated Behavior Policy »
Josiah Hanna · Scott Niekum · Peter Stone -
2019 Oral: Importance Sampling Policy Evaluation with an Estimated Behavior Policy »
Josiah Hanna · Scott Niekum · Peter Stone -
2018 Poster: Learning to Explore via Meta-Policy Gradient »
Tianbing Xu · Qiang Liu · Liang Zhao · Jian Peng -
2018 Poster: Stein Variational Gradient Descent Without Gradient »
Jun Han · Qiang Liu -
2018 Oral: Stein Variational Gradient Descent Without Gradient »
Jun Han · Qiang Liu -
2018 Oral: Learning to Explore via Meta-Policy Gradient »
Tianbing Xu · Qiang Liu · Liang Zhao · Jian Peng -
2018 Poster: Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy »
Jiasen Yang · Qiang Liu · Vinayak A Rao · Jennifer Neville -
2018 Poster: Stein Variational Message Passing for Continuous Graphical Models »
Dilin Wang · Zhe Zeng · Qiang Liu -
2018 Oral: Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy »
Jiasen Yang · Qiang Liu · Vinayak A Rao · Jennifer Neville -
2018 Oral: Stein Variational Message Passing for Continuous Graphical Models »
Dilin Wang · Zhe Zeng · Qiang Liu -
2017 Poster: Data-Efficient Policy Evaluation Through Behavior Policy Search »
Josiah Hanna · Philip S. Thomas · Peter Stone · Scott Niekum -
2017 Talk: Data-Efficient Policy Evaluation Through Behavior Policy Search »
Josiah Hanna · Philip S. Thomas · Peter Stone · Scott Niekum