Secure Multi-agent Reinforcement Learning for Service Systems with Affinity and Byzantine Nodes: Stability Analysis and Protection Design
Abstract
We study decentralized multi-agent reinforcement learning (MARL) for networked service systems with affinity in the presence of Byzantine nodes. The way that a server processes a job depends on an affinity state that captures the correlation between the job and the server. Each node learns a local control policy via an actor-critic algorithm with linear function approximation over inherently unbounded space of traffic states, while exchanging parameter information with neighbors through a communication graph. A set of Byzantine agents can exploit the unbounded state space and the resulting stochastic variance to compromise the consensus mechanism, destabilizing both learning and queuing processes. To address this vulnerability, we propose a resilient consensus-based MARL algorithm with momentum-based smoothing, which mitigates adversarial parameter manipulation and guarantees traffic stability under mild assumptions. We prove that the cooperative agents’ policies converge almost surely to a bounded neighborhood of a stationary solution of the global objective. We demonstrate the effectiveness and generality of the proposed framework in several representative service systems, including semantic routing for large language model serving, distributed polling in cloud computing, and smart manufacturing logistics.