IEC: When Information-Driven Exploration Meets Spectral Consensus via Primal–Dual Reward Regularization in Decentralized Multi-Agent RL
Abstract
Decentralized multi-agent reinforcement learning faces a persistent exploration–coordination tension: intrinsic rewards promote exploration under sparse feedback, yet effective cooperation requires agents’ behaviors to remain consistent over a limited communication graph. Existing methods often combine exploration bonuses and coordination regularizers with fixed-weight schedules, making them hard to tune and prone to either fragmented conventions or premature behavioral collapse. We propose the IEC (Isomorphic Exploration-Consensus) framework that couples exploration and coordination through a single constrained objective: maximize task return augmented with two complementary exploration signals, dynamics-based information gain and state-coverage novelty, while constraining graph-induced policy disagreement via a spectral smoothness penalty on neighboring agents, which can be interpreted as a Dirichlet-energy regularizer on the communication graph. IEC optimizes the resulting Lagrangian with a lightweight primal–dual update that adapts the consensus multiplier from observed constraint violations, yielding an automatic shift from diverse exploration to stable cooperative conventions. Across three distinct benchmarks, IEC achieves superior performance.