ShapCCS: Shapley-Driven Client Coreset Selection in Federated Learning
Shuo Ji ⋅ Jie Hu ⋅ Zhouqiao He ⋅ Zijie Zhao ⋅ Tianrui Li ⋅ Jie Xu
Abstract
Computation overhead has emerged as a critical bottleneck in Federated Learning (FL). Coreset selection tackles this challenge by constructing an informative subset to represent the full dataset. However, existing approaches optimize coreset construction solely at the data level and enforce a uniform retention ratio across all clients, ignoring client heterogeneity and introducing detrimental fragmented clients. In this paper, we first introduce a *gradient projection Shapley value* (GPSV) to evaluate client contributions. GPSV captures both the directional and magnitude information of client updates and enables exact Shapley value calculation with $\mathcal{O}(1)$ per-coalition evaluation. Building on GPSV, we then propose ShapCCS, the first client-level coreset selection strategy for FL. ShapCCS prioritizes clients with high GPSV scores while excluding fragmented clients with negligible or even negative GPSV. As a client-level coreset selection strategy, ShapCCS can be integrated with a data-level selection approach, and additionally reduces communication costs, an advantage unattainable by data-level methods alone. Extensive experiments demonstrate the superiority of ShapCCS on model performance and robustness to noise. The code is available at https://anonymous.4open.science/r/ShapCCS-5CBB.
Successful Page Load