Improving the Robustness-Utility Trade-off in Decentralized Learning over Sparse Networks
Yangnan Li ⋅ Xuanyu Cao ⋅ Shenghui Song
Abstract
Resilience against Byzantine attackers and faster convergence on sparse networks are critical for decentralized optimization, yet existing methods fail to achieve both simultaneously. Existing DSGD-based Byzantine-resilient methods suffer from high transient complexity of $\mathcal{O}\left((1-\lambda)^{-6}\right)$, where $1-\lambda$ denotes the spectral gap of the network. While bias-correction methods such as Exact Diffusion can improve topology dependence, directly combining them with robust aggregators can lead to error accumulation. To address this issue, we introduce the scaled dual ascent (SDA) within the augmented Lagrangian framework for decentralized optimization, which mitigates error accumulation by scaling the dual update steps. Based on this, we propose BRED, which integrates Byzantine-robust Exact Diffusion with the SDA framework. We prove that BRED attains linear speedup, and achieves transient complexity of $\mathcal{O}\left((1-\lambda)^{-2}\right)$ when the Byzantine fraction $\delta$ is small. We further propose the momentum variant BRED-M, which reduces the Byzantine-affected transient complexity from $\mathcal{O}\left(\delta^2(1-\lambda)^{-6}\right)$ to $\mathcal{O}\left(\delta^2(1-\lambda)^{-4}\right)$. Empirical results on benchmark datasets demonstrate the efficacy of the proposed methods across diverse network topologies.
Successful Page Load