No Retraining at Edge: Efficient Resource-Aware Mixed-Precision Quantization via Federated Supernet Learning
Abstract
Federated learning (FL) enables collaborative training across distributed edge devices, but deploying lightweight models in dynamic edge environments remains challenging. Existing methods typically require retraining whenever device resource constraints change, resulting in excessive computational overhead. We propose DFMPQ, a dynamic federated mixed-precision quantization framework that enables retraining-free deployment at the edge. DFMPQ trains a weight-sharing mixed-precision supernet via FL, which jointly represents diverse bit-width configurations. After training, resource-aware quantized subnets can be derived on demand to satisfy heterogeneous and time-varying resource constraints without additional optimization. Optimizing such a supernet in federated settings is difficult due to optimization interference among heterogeneous bit-widths and the coupling of quantization noise with non-IID data. DFMPQ addresses these issues through semantic-aware training and aggregation mechanisms that stabilize supernet optimization. In addition, a sensitivity-guided greedy search strategy is adopted to efficiently identify suitable quantization configurations under given resource budgets. Extensive experiments on multiple datasets and network architectures demonstrate that DFMPQ achieves competitive accuracy with significantly reduced computational cost, enabling efficient deployment for dynamic edge computing environments.