TACTIC: Task-Aware Sparse Coordination Graphs for Multi-Task Multi-agent Reinforcement Learning
Abstract
Value factorization eases non-stationarity in MARL, but its static coordination assumptions hinder generalization on long-horizon tasks with shifting dependencies. Prior VQ-VAE methods abstract trajectories yet miss time-varying inter-agent dependencies. We present TACTIC, a CTDE framework with three advances: (i) hierarchical goal decomposition to guide exploration under sparse rewards; (ii) dynamic sparse coordination graphs that adapt dependencies via variance-based TD-error pruning; and (iii) a semantic-conditioned VQ-VAE that discretizes trajectories into coordination classes and maps them to graph-level edge decisions, while also conditioning local policies. A pretrained, frozen goal predictor decouples task recognition from control, preventing gradient interference across coordination abstractions. On SMAC and SUMO, TACTIC delivers state-of-the-art coordination and transfer under sparse rewards and dynamic task structures.