Timezone: »
Leveraging Side Information for Communication-Efficient Federated Learning
Berivan Isik · Francesco Pase · Deniz Gunduz · Sanmi Koyejo · Tsachy Weissman · Michele Zorzi
Event URL: https://openreview.net/forum?id=1wakbAoVWo »
The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has \emph{side information} in the form of a pre-data distribution $p_{\theta}$ that is close to the client's distribution $q_{\phi^{(n)}}$ \emph{in Kullback–Leibler (KL) divergence}. We exploit this \emph{closeness} between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks such as FedPM, Federated SGLD, and QSGD to attain the same (and often higher) test accuracy with up to $50$ times reduction in the bitrate.
The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has \emph{side information} in the form of a pre-data distribution $p_{\theta}$ that is close to the client's distribution $q_{\phi^{(n)}}$ \emph{in Kullback–Leibler (KL) divergence}. We exploit this \emph{closeness} between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks such as FedPM, Federated SGLD, and QSGD to attain the same (and often higher) test accuracy with up to $50$ times reduction in the bitrate.
Author Information
Berivan Isik (Stanford University)
Francesco Pase (Universita' degli studi di Padova)
Deniz Gunduz (Imperial College London)
Sanmi Koyejo (Stanford University)
Tsachy Weissman (Stanford University)
Michele Zorzi (University of Padua)
More from the Same Authors
-
2021 : Less is More: Feature Selection for Adversarial Robustness with Compressive Counter-Adversarial Attacks »
Emre Ozfatura · Muhammad Zaid Hameed · Kerem Ozfatura · Deniz Gunduz -
2021 : Active privacy-utility trade-off against a hypothesis testing adversary »
Ecenaz Erdemir · Pier Luigi Dragotti · Deniz Gunduz -
2023 : Layer-Wise Feedback Alignment is Conserved in Deep Neural Networks »
Zach Robertson · Sanmi Koyejo -
2023 : FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation »
Dhruv Pai · Andres Carranza · Rylan Schaeffer · Arnuv Tandon · Sanmi Koyejo -
2023 : Exact Optimality in Communication-Privacy-Utility Tradeoffs »
Berivan Isik · Wei-Ning Chen · Ayfer Ozgur · Tsachy Weissman · Albert No -
2023 : GPT-Zip: Deep Compression of Finetuned Large Language Models »
Berivan Isik · Hermann Kumbong · Wanyi Ning · Xiaozhe Yao · Sanmi Koyejo · Ce Zhang -
2023 : Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data »
Alycia Lee · Brando Miranda · Sanmi Koyejo -
2023 : Are Emergent Abilities of Large Language Models a Mirage? »
Rylan Schaeffer · Brando Miranda · Sanmi Koyejo -
2023 : Thomas: Learning to Explore Human Preference via Probabilistic Reward Model »
Sang Truong · Duc Nguyen · Tho Quan · Sanmi Koyejo -
2023 : On learning domain general predictors »
Sanmi Koyejo -
2023 Workshop: Neural Compression: From Information Theory to Applications »
Berivan Isik · Yibo Yang · Daniel Severo · Karen Ullrich · Robert Bamler · Stephan Mandt -
2023 : Deceptive Alignment Monitoring »
Andres Carranza · Dhruv Pai · Rylan Schaeffer · Arnuv Tandon · Sanmi Koyejo -
2023 : Vignettes on Pairwise-Feedback Mechanisms for Learning with Uncertain Preferences »
Sanmi Koyejo -
2023 Poster: Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions »
Boxiang Lyu · Zhe Feng · Zach Robertson · Sanmi Koyejo -
2021 Workshop: Information-Theoretic Methods for Rigorous, Responsible, and Reliable Machine Learning (ITR3) »
Ahmad Beirami · Flavio Calmon · Berivan Isik · Haewon Jeong · Matthew Nokleby · Cynthia Rush -
2021 Affinity Workshop: Women in Machine Learning (WiML) Un-Workshop »
Wenshuo Guo · Beliz Gokkaya · Arushi G K Majha · Vaidheeswaran Archana · Berivan Isik · Olivia Choudhury · Liyue Shen · Hadia Samil · Tatjana Chavdarova -
2019 Poster: Neural Joint Source-Channel Coding »
Kristy Choi · Kedar Tatwawadi · Aditya Grover · Tsachy Weissman · Stefano Ermon -
2019 Oral: Neural Joint Source-Channel Coding »
Kristy Choi · Kedar Tatwawadi · Aditya Grover · Tsachy Weissman · Stefano Ermon