FedCDWA: Decoupled Federated Prototype Distillation with Hierarchical Wasserstein Aggregation
Abstract
Federated learning enables decentralized clients to collaboratively train models without sharing local data. However, heterogeneous client distributions often induce client drift and hinder convergence. This paper proposes FedCDWA, a decoupled hierarchical federated distillation framework. FedCDWA decouples client-side personalized distillation from server-side mutual distillation to mitigate distillation-induced optimization conflicts. It further adopts Hierarchical Wasserstein Aggregation to aggregate prototypes without restrictive parametric assumptions while preserving intra-class structure and inter-class geometry. To achieve finer-grained feature alignment, Prototype–Variance Dual Alignment matches feature means and variances in the feature space. We prove convergence guarantees for FedCDWA. Experiments on three datasets demonstrate that FedCDWA consistently improves both global and personalized accuracy across heterogeneity levels, with smaller performance degradation under more severe heterogeneity.