Expected Returns and Policy Inconsistency-Aware Offline Federated Deep Reinforcement Learning
Abstract
Offline Federated Deep Reinforcement Learning (FDRL) methods aggregate multiple client-side offline Deep Reinforcement Learning (DRL) models, each trained locally, to facilitate knowledge sharing while preserving privacy. Existing offline FDRL methods assign client weights during global aggregation using either simple averaging or Q-values, but they neglect the combined consideration of Q-values and policy inconsistency, the latter of which reflects the distributional discrepancy between the learned policy and the policy from offline data. This causes clients with no significant advantages in one aspect but obvious disadvantages in the other to disproportionately affect the global model, thereby degrading its capabilities in that aspect. During local training, clients in existing methods are compelled to fully adopt the global model, which negatively impacts clients when the global model is weak. To address these limitations, we propose a novel Federated Learning (FL) framework that can be seamlessly integrated into current offline FDRL approaches to improve their performance. Our method considers both policy inconsistency and Q-values to determine the weights of client models, with the latter adjusted by a scaling factor to avoid significant numerical discrepancies with the former. The aggregated global model is then distributed to clients to facilitate their learning from the global model. The impact of the global model on the local models is reduced when a client's model performance exceeds that of the global model, thereby mitigating the influence of a weaker global model. Experiments on the Datasets for Deep Data-Driven Reinforcement Learning (D4RL) demonstrate that our method enhances six state-of-the-art (SOTA) offline FDRL methods in terms of return and D4RL score.