Sparse Topology-Aware Pairwise Scoring for Large-Scale Multi-Agent Reinforcement Learning
Zhibo Deng ⋅ Feng Liang ⋅ Yong Zhang ⋅ Xiaoxi Zhang ⋅ Xiping Hu
Abstract
In multi-agent reinforcement learning (MARL), communication enables agents to mitigate partial observability and stochasticity through information sharing, but large-scale systems inherently lead to a rapidly growing number of pairwise interactions. Previous studies often struggle to simultaneously achieve scalability and task adaptivity in large-scale multi-agent communication. To address this challenge, we propose a scalable communication scheme for large-scale MARL, termed $\textit{Sparse tOpology-aware Pairwise Scoring}$ (SOPS). We argue that scalable MARL communication requires decoupling scalability from task-adaptive link allocation. To ensure scalability, we constrain communication to an exponential-graph backbone with a small diameter, which preserves rapid potential information mixing while keeping per-agent candidates logarithmic. On top of this constraint, we learn a task-conditioned probabilistic subgraph distribution via a pairwise scoring network over agent states and edge-type embeddings to allocate sparse links for maximizing return, optimized end-to-end through differentiable Gumbel-Sigmoid reparameterization. Evaluation results show that SOPS significantly outperforms existing state-of-the-art methods across cooperative benchmarks of diverse scales and exhibits robust zero-shot transfer capabilities.
Successful Page Load