Value Aggregation with Uncertainty in Online Decentralized MARL
Abstract
Multi-agent reinforcement learning (MARL) has received increasing attention for solving complex decision-making tasks. Networked MARL approaches offer a decentralized solution for parameter sharing to accelerate training via value aggregation. However, existing federated aggregations rely on convex averaging that may fail to converge to global optima and cause learning rollback in the online learning setting. In this paper, we formally characterize the learning rollback phenomenon arising from aggregating value estimates with unequal uncertainty under heterogeneous online update depths. We propose a novel adaptive global consensus (AGC) mechanism for Q-value aggregation in decentralized MARL policy evaluation, which dynamically adjusts aggregation weights based on agents’ uncertainty. We establish theoretical guarantees on accelerated convergence and bounded learning variance with empirical validations, advancing the state-of-art MARL theory.