Timezone: »
Proposed around 2016 as privacy preserving techniques, federated learning and analytics (FL & FA) made remarkable progress in theory and practice in recent years. However, there is a growing disconnect between theoretical research and practical applications of federated learning. This workshop aims to bring academics and practitioners closer together to exchange ideas: discuss actual systems and practical applications to inspire researchers to work on theoretical and practical research questions that lead to real-world impact; understand the current development and highlight future directions. To achieve this goal, we aim to have a set of keynote talks and panelists by industry researchers focused on deploying federated learning and analytics in practice, and academic research leaders who are interested in bridging the gap between the theory and practice.
For more details, please visit the workshop webpage at https://fl-icml2023.github.io
Fri 12:00 p.m. - 12:05 p.m.
|
Introduction and Opening Remarks
(
opening
)
SlidesLive Video » |
Zheng Xu 🔗 |
Fri 12:05 p.m. - 12:40 p.m.
|
Vojta Jina: Lessons from Applying Private Federated Learning
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Fri 12:40 p.m. - 1:00 p.m.
|
Two Spotlight Talks
(
Spotlight Talks
)
SlidesLive Video » Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation. Presenter: Wei-Ning Chen On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning. Presenter: Lingxiao Wang |
🔗 |
Fri 1:00 p.m. - 1:15 p.m.
|
Break
|
🔗 |
Fri 1:15 p.m. - 1:50 p.m.
|
Li Xiong: Federated Learning with Personalized and User-level Differential Privacy
(
Invited Talk
)
SlidesLive Video » |
Li Xiong 🔗 |
Fri 1:50 p.m. - 2:25 p.m.
|
Brendan McMahan: Advances in Privacy and Federated Learning, with Applications to GBoard
(
Invited Talk
)
SlidesLive Video » |
Brendan McMahan 🔗 |
Fri 2:25 p.m. - 4:30 p.m.
|
Poster and Lunch
(
Poster
)
|
🔗 |
Fri 4:30 p.m. - 5:25 p.m.
|
Panel Discussion
(
Panel
)
SlidesLive Video » |
Peter Kairouz · Song Han · Kamalika Chaudhuri · Florian Tramer 🔗 |
Fri 5:25 p.m. - 6:00 p.m.
|
Ce Zhang: Optimizing Communications and Data for Distributed and Decentralized Learning
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Fri 6:00 p.m. - 6:15 p.m.
|
Break
|
🔗 |
Fri 6:15 p.m. - 6:50 p.m.
|
Giulia Fanti: New Variants of Old Challenges in Data Valuation and Privacy
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Fri 6:50 p.m. - 7:20 p.m.
|
Three Spotlight Talks
(
Spotlight Talks
)
SlidesLive Video » Privacy Auditing with One (1) Training Run.Presenter:Thomas Steinke Federated Heavy Hitter Recovery under Linear Sketching.Presenter:Ziteng Sun Towards a Better Theoretical Understanding of Independent Subnetwork Training.Presenter:Peter Richtarik |
🔗 |
Fri 7:20 p.m. - 7:55 p.m.
|
Chuan Guo: Towards (Truly) Private and Communication-efficient Federated Learning
(
Invited Talk
)
SlidesLive Video » |
🔗 |
Fri 7:55 p.m. - 8:00 p.m.
|
Concluding Remarks
(
Concluding
)
SlidesLive Video » |
🔗 |
-
|
On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning
(
Poster
)
link »
Federated Averaging/local SGD is the most common optimization method for federated learning that has proven effective in many real-world applications, dominating simple baselines like mini-batch SGD. However, theoretically showing the effectiveness of local SGD remains challenging, posing a huge gap between theory and practice. In this paper, we provide new lower bounds for local SGD, ruling out proposed heterogeneity assumptions that try to capture this "unreasonable" effectiveness of local SGD. We show that accelerated mini-batch SGD is, in fact, min-max optimal under some heterogeneity notions. This highlights the need for new heterogeneity assumptions for federated optimization, and we discuss some alternative assumptions. |
Kumar Kshitij Patel · Margalit Glasgow · Lingxiao Wang · Nirmit Joshi · Nati Srebro 🔗 |
-
|
Towards a Theoretical and Practical Understanding of One-Shot Federated Learning with Fisher Information
(
Poster
)
link »
Standard federated learning (FL) algorithms typically require multiple rounds of communication between the server and the clients, which has several drawbacks including requiring constant network connectivity, repeated investment of computation resources and susceptibility to privacy attacks. One-Shot FL is a new paradigm that aims to address this challenge by enabling the server to train a global model in a single round of communication. In this work, we present FedFisher, a novel algorithm for one-shot FL that makes use of the Fisher information matrices computed at the local models of clients, motivated by a Bayesian perspective of FL. First, we theoretically analyze FedFisher for two-layer overparameterized ReLU neural networks and show that the error of our one-shot FedFisher global model becomes vanishingly small as the width of the neural networks and amount of local training at clients increases. Next we propose practical variants of FedFisher using the diagonal Fisher and K-FAC approximation for the full Fisher and highlight their communication and compute efficiency for FL. Finally, we conduct extensive experiments on various datasets, which show that these variants of FedFisher consistently improve over several competing baselines. |
Divyansh Jhunjhunwala · Shiqiang Wang · Gauri Joshi 🔗 |
-
|
Machine Learning with Feature Differential Privacy
(
Poster
)
link »
Machine learning applications incorporating differential privacy frequently face significant utility degradation. One prevalent solution involves enhancing utility through the use of publicly accessible information. Public data-points, well-known for their utility-enhancing capabilities in private training, have received considerable attention. However, it is worth noting that these public sources can vary substantially in their nature.In this work, we explore the feasibility of leveraging public features from the private dataset. For instance, envision a tabular dataset in which some features are publicly accessible while others are kept private. We delve into this scenario, defining a concept we refer to as \textit{feature-DP}. We examine feature DP in the context of private optimization, and propose a solution based the widely used DP-SGD framework. Notably, our framework maintains the advantage of privacy amplification through sub-sampling, even while some features are disclosed.We analyze our algorithm for Lipschitz and convex loss functions and we establish privacy and excess empirical risk bounds. Importantly, due to our strategy's ability to harness privacy amplification via sub-sampling, our excess risk bounds converge to zero as the number of data points increases. This enables us to improve upon previously understood excess risk bounds for label differential privacy, and provides a response to an open question proposed by (Ghazi et al. 21).We applied our methodology to the Purchase100 dataset, finding that the public features facilitated by our framework can indeed improve the balance between utility and privacy. |
Saeed Mahloujifar · Chuan Guo · G. Edward Suh · Kamalika Chaudhuri 🔗 |
-
|
Hiding in Plain Sight: Disguising Data Stealing Attacks in Federated Learning
(
Poster
)
link »
Malicious server (MS) attacks have scaled data stealing in federated learning to more challenging settings. However, concerns regarding client-side detectability of MS attacks were raised, questioning their practicality once they are publicly known. In this work, we thoroughly study the problem of detectability for the first time. We show that most prior MS attacks, which fundamentally rely on one of two key principles, are detectable by principled client-side checks. Further, we propose SEER, a novel attack framework that is less detectable by design, and able to steal user data from gradients even for large batch sizes (up to 512) and under secure aggregation. Our key insight is the use of a secret decoder, jointly trained with the shared model to disaggregate in a secret space. Our work is a promising first step towards more principled treatment of MS attacks, paving the way for realistic data stealing that can compromise user privacy in real-world deployments. |
Kostadin Garov · Dimitar I. Dimitrov · Nikola Jovanović · Martin Vechev 🔗 |
-
|
Beyond Secure Aggregation: Scalable Multi-Round Secure Collaborative Learning
(
Poster
)
link »
Privacy-preserving machine learning (PPML) has achieved exciting breakthroughs for secure collaborative training of machine learning models under formal information-theoretic privacy guarantees. Despite the recent advances, communication bottleneck still remains as a major challenge against scalability to large neural networks. To address this challenge, in this work we introduce the first end-to-end multi-round multi-party neural network training framework with linear communication complexity, under formal information-theoretic privacy guarantees. Our key contribution is a scalable secure computing mechanism for iterative polynomial operations, which incurs only linear communication overhead, significantly improving over the quadratic state-of-the-art, while providing formal end-to-end multi-round information-theoretic privacy guarantees. In doing so, our framework achieves equal adversary tolerance, resilience to user dropouts, and model accuracy as the state-of-the-art, while addressing a key challenge in scalable training. |
Umit Basaran · Xingyu Lu · Basak Guler 🔗 |
-
|
Federated Experiment Design under Distributed Differential Privacy
(
Poster
)
link »
Experiment design has a rich history dating back to the early 1920s and has found numerous critical applications across various fields since then. However, the use and collection of users' data in experiments often involve sensitive personal information, so additional measures to protect individual privacy are required during data collection, storage, and usage. In this work, we focus on the rigorous protection of users' privacy (under the notion of differential privacy (DP)) while minimizing the trust toward service providers. Specifically, we consider the estimation of the average treatment effect (ATE) under Neyman's potential outcome framework under DP and secure aggregation, a distributed protocol enabling a service provider to aggregate information without accessing individual data. To achieve DP, we design local privatization mechanisms that are compatible with secure aggregation. We show that when introducing DP noise, it is imperative to 1) cleverly split privacy budgets to estimate both the mean and variance of the outcomes and 2) carefully calibrate the confidence intervals according to the DP noise. Finally, we present comprehensive experimental evaluations of our proposed schemes and show the privacy-utility trade-offs in experiment design. |
Wei-Ning Chen · Graham Cormode · Akash Bharadwaj · Peter Romov · Ayfer Ozgur 🔗 |
-
|
FedFwd: Federated Learning without Backpropagation
(
Poster
)
link »
In federated learning (FL), clients with limited resources can disrupt the training efficiency.A potential solution to this problem is to leverage a new learning procedure that does not rely on the computation- and memory-intensive backpropagation algorithm (BP).This study presents a novel approach to FL called FedFwd that employs a recent BP-free algorithm by Hinton (2022), namely the Forward Forward algorithm, during the local training process.Unlike previous methods, FedFwd does not require the computation of gradients, and therefore, there is no need to store all intermediate activation values during training.We conduct various experiments to evaluate FedFwd on standard datasets including MNIST and CIFAR-10, and show that it works competitively to other BP-dependent FL methods. |
Seonghwan Park · Dahun Shin · Jinseok Chung · Namhoon Lee 🔗 |
-
|
Unleashing the Power of Randomization in Auditing Differentially Private ML
(
Poster
)
link »
We present a rigorous methodology for auditing differentially private machine learning algorithms by adding multiple carefully designed examples called canaries. We take a first principles approach based on three key components. First, we introduce Lifted Differential Privacy (Lifted DP) that expands the definition of differential privacy to handle randomized datasets. This gives us the freedom to design randomized canaries. Second, we audit Lifted DP by trying to distinguish between the model trained with $K$ canaries versus $K-1$ canaries in the dataset, leaving one canary out. By drawing the canaries i.i.d., Lifted DP can leverage the symmetry in the design and reuse each privately trained model to run multiple statistical tests, one for each canary. Third, we introduce novel confidence intervals that take advantage of the multiple test statistics by adapting to the empirical higher-order correlations. Together, this new recipe demonstrates significant improvements in sample complexity, both theoretically and empirically, using synthetic and real data. Further, recent advances in designing stronger canaries can be readily incorporated into the new framework.
|
Krishna Pillutla · Galen Andrew · Peter Kairouz · Hugh B McMahan · Alina Oprea · Sewoong Oh 🔗 |
-
|
Demystifying Local and Global Fairness Trade-offs in Federated Learning Using Information Theory
(
Poster
)
link »
We present an information-theoretic perspective to group fairness trade-offs in federated learning (FL) with respect to sensitive attributes, such as gender, race, etc. Existing works mostly focus on either \emph{global fairness} (overall disparity of the model across all clients) or \emph{local fairness} (disparity of the model at each individual client), without necessarily considering their trade-offs. There is a lack of understanding of the interplay between global and local fairness in FL, and if and when one implies the other. To address this gap, we leverage a body of work in information theory called partial information decomposition (PID) which first identifies three sources of unfairness in FL, namely, \emph{Unique Disparity}, \emph{Redundant Disparity}, and \emph{Masked Disparity}. Using canonical examples, we demonstrate how these three disparities contribute to global and local fairness. This decomposition helps us derive fundamental limits and trade-offs between global or local fairness, particularly under data heterogeneity, as well as, derive conditions under which one implies the other. We also present experimental results on real-world datasets to support our theoretical findings. This work offers a more nuanced understanding of the sources of disparity in FL that can inform the use of local disparity mitigation techniques, and their convergence and effectiveness when deployed in practice. |
Faisal Hamman · Sanghamitra Dutta 🔗 |
-
|
Fast and Communication Efficient Decentralized Learning with Local Updates
(
Poster
)
link »
Gossip and random walk-based learning are widely considered decentralized learning algorithms. Gossip algorithms (both synchronous and asynchronous) suffer from high communication cost, while random-walk based learning experiences high convergence time. In this paper, we design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST by taking advantage of both Gossip and random-walk ideas, and focusing on stochastic gradient descent (SGD). DIGEST is an asynchronous decentralized learning mechanism building on local-SGD, which is originally designed for communication efficient centralized learning. We analyze the convergence of DIGEST and prove that it approaches to the optimal solution asymptotically for both iid and non-iid data distributions. We evaluate the performance of DIGEST for logistic regression and a deep neural network ResNet20. The simulation results confirm that multi-stream DIGEST has nice convergence properties; its convergence time outperforms the baselines when data distribution is non-iid. |
Peyman Gholami · Hulya Seferoglu 🔗 |
-
|
Distributed Architecture Search over Heterogeneous Distributions
(
Poster
)
link »
Federated learning (FL) assists distributed machine learning when data cannot be shared with a centralized server. Recent advancements in FL use predefined architecture-based learning for all clients. However, given that clients' data are invisible to the server and data distributions are non-identical across clients, a predefined architecture discovered in a centralized setting may not be an optimal solution for all the clients in FL. Motivated by this challenge, we introduce SPIDER, an algorithmic framework that aims to Search PersonalIzed neural architecture for feDERated learning. SPIDER is designed based on two unique features: (1) alternately optimizing one architecture-homogeneous global model (Supernet) in a generic FL manner and one architecture-heterogeneous local model that is connected to the global model by weight-sharing-based regularization (2) achieving architecture-heterogeneous local model by an operation-level perturbation based neural architecture search method. Experimental results demonstrate that SPIDER outperforms other state-of-the-art personalization methods on three datasets. |
Erum Mushtaq · Chaoyang He · Jie Ding · Salman Avestimehr 🔗 |
-
|
Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning
(
Poster
)
link »
In Federated Learning a global model is learned by aggregating model updates computed at a set of independent client nodes. A key challenge in this setting is data heterogeneity across clients resulting in differing local objectives which can lead clients to overly minimize their own local objective, diverging from the global solution. We demonstrate that individual client models experience a catastrophic forgetting with respect to data from other clients and propose an efficient approach that modifies the cross-entropy objective on a per-client basis by re-weighting the softmax logits prior to computing the loss. This approach shields classes outside a client’s label set from abrupt representation change and we empirically demonstrate it can alleviate client forgetting and provide consistent improvements to standard federated learning algorithms. Our method is particularly beneficial under the most challenging federated learning settings where data heterogeneity is high and client participation in each round is low. |
Gwen Legate · Lucas Caccia · Eugene Belilovsky 🔗 |
-
|
Tackling the Data Heterogeneity in Asynchronous Federated Learning with Cached Update Calibration
(
Poster
)
link »
|
Yujia Wang · Yuanpu Cao · Jingcheng Wu · Ruoyu Chen · Jinghui Chen 🔗 |
-
|
Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation
(
Poster
)
link »
Privacy and communication constraints are two major bottlenecks in federated learning (FL) and analytics (FA). We study the optimal accuracy of mean and frequency estimation for FL and FA respectively under joint communication and $(\varepsilon, \delta)$-differential privacy (DP) constraints. We consider both the central and the multi-message shuffling DP models. We show that in order to achieve the optimal $\ell_2$ error under $(\varepsilon, \delta)$-DP, it is sufficient for each client to send $\Theta\left( n \min\left(\varepsilon, \varepsilon^2\right)\right)$ bits for FL and $\Theta\left(\log\left( n\min\left(\varepsilon, \varepsilon^2\right) \right)\right)$ bits for FA to the server, where $n$ is the number of clients. We propose two different ways to leverage compression for privacy amplification and achieve the optimal privacy-communication-accuracy trade-off. In both cases, each client communicates only partial information about its sample and we show that privacy is amplified by randomly selecting the part contributed by each client. In the first method, the random selection is revealed to the server, which results in a central DP guarantee with optimal privacy-communication-accuracy trade-off. In the second method, the random data parts at each client are privatized locally and anonymized by a secure shuffler, eliminating the need for a trusted server. This results in a multi-message shuffling scheme with the same optimal trade-off. As a result, our paper establishes the optimal three-way trade-off between privacy, communication, and accuracy for both the central DP and multi-message shuffling frameworks.
|
Wei-Ning Chen · Dan Song · Ayfer Ozgur · Peter Kairouz 🔗 |
-
|
FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning
(
Poster
)
link »
Recent advancements in federated learning (FL) seek to increase client-level performance by fine-tuning client parameters on local data or personalizing architectures for the local task. Existing methods for such personalization either prune a global model or fine-tune a global model on a local client distribution. However, these existing methods either personalize at the expense of retaining important global knowledge, or predetermine network layers for fine-tuning, resulting in suboptimal storage of global knowledge within client models. Enlightened by the lottery ticket hypothesis, we first introduce a hypothesis for finding optimal client subnetworks to locally fine-tune while leaving the rest of the parameters frozen. We then propose a novel FL framework, FedSelect, using this procedure that directly personalizes $\textit{both client subnetwork structure and parameters}$, via the simultaneous discovery of optimal parameters for personalization and the rest of parameters for global aggregation $\textit{during training}$. We show that this method achieves promising results on CIFAR-10.
|
Rishub Tamirisa · John Won · Chengjun Lu · Ron Arel · Andy Zhou 🔗 |
-
|
A New Theoretical Perspective on Data Heterogeneity in Federated Optimization
(
Poster
)
link »
In federated optimization, data heterogeneity is the main reason that existing theoretical analyses are pessimistic about the convergence error caused by local updates. However, experimental results have shown that more local updates can improve the convergence rate and reduce the communication cost when data are heterogeneous. This paper bridges this gap between theoretical understanding and the practical performance by providing a general theoretical analysis for federated averaging (FedAvg) with non-convex objective functions from a new perspective on data heterogeneity. Identifying the limitations in the commonly used assumption of bounded gradient divergence, we propose a new assumption, termed the heterogeneity-driven Lipschitz assumption, which characterizes the fundamental effect of data heterogeneity on local updates.We find the widely used local Lipschitz constant is affected by data heterogeneity, which is neglected in the literature.The proposed heterogeneity-driven Lipschitz constant can capture the information about data heterogeneity contained in local Lipschitz constant. At the same time, the information about the gradient smoothness is captured by the global Lipschitz assumption.Based on the new assumption, we derive novel convergence bounds for both full participation and partial participation, which are tighter and show that more local updates can improve the convergence rate even when data are highly heterogeneous.Furthermore, the assumptions used in this paper are weaker than those used in the literature. |
Jiayi Wang · Shiqiang Wang · Rong-Rong Chen · Mingyue Ji 🔗 |
-
|
Don’t Memorize; Mimic The Past: Federated Class Incremental Learning Without Episodic Memory
(
Poster
)
link »
Deep learning models are prone to forgetting information learned in the past when trained on new data. This problem becomes even more pronounced in the context of Federated Learning (FL), where data is decentralized and subject to independent changes for each user. Continual Learning (CL) studies this so-called catastrophic forgetting phenomenon primarily in centralized settings, where the learner has direct access to the complete training dataset. However, applying CL techniques to FL is not straightforward due to privacy concerns and resource limitations. This paper presents a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data. Then, clients can leverage the generative model to mitigate catastrophic forgetting locally. To preserve privacy, the generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. We demonstrate significant improvements for the CIFAR-100 dataset compared to existing baselines. |
Sara Babakniya · Zalan Fabian · Chaoyang He · Mahdi Soltanolkotabi · Salman Avestimehr 🔗 |
-
|
Fed-CPrompt: Contrastive Prompt for Rehearsal-Free Federated Continual Learning
(
Poster
)
link »
Federated continual learning (FCL) learns incremental tasks over time from confidential datasets distributed across clients. This paper focuses on rehearsal-free FCL, which has severe forgetting issues when learning new tasks due to the lack of access to historical task data. To address this issue, we propose Fed-CPrompt based on prompt learning techniques to obtain task-specific prompts in a communication-efficient way. Fed-CPrompt introduces two key components, asynchronous prompt learning, and contrastive continual loss, to handle asynchronous task arrival and heterogeneous data distributions in FCL, respectively. Extensive experiments demonstrate the effectiveness of Fed-CPrompt in achieving SOTA rehearsal-free FCL performance. |
GAURAV BAGWE · Xiaoyong Yuan · Miao Pan · Lan Zhang 🔗 |
-
|
Improving Accelerated Federated Learning with Compression and Importance Sampling
(
Poster
)
link »
Federated Learning is a collaborative training framework that leverages heterogeneous data distributed across a vast number of clients. Since it is practically infeasible to request and process all clients during the aggregation step, partial participation must be supported. In this setting, the communication between the server and clients poses a major bottleneck. To reduce communication loads, there are two main approaches: compression and local steps. Recent work by Mishchenko et al. (2022) introduced the new ProxSkip method, which achieves an accelerated rate using the local steps technique. Follow-up works successfully combined local steps acceleration with partial participation (Grudzień et al., 2023; Condat et al., 2023) and gradient compression (Condat et al., 2022). In this paper, we finally present a complete method for Federated Learning that incorporates all necessary ingredients: Local Training, Compression, and Partial Participation. Moreover, we analyze the general sampling framework for partial participation and derive an importance sampling scheme, which leads to even better performance. We experimentally demonstrate the advantages of the proposed method in practice. |
Michał Grudzień · Grigory Malinovsky · Peter Richtarik 🔗 |
-
|
Federated Heavy Hitter Recovery under Linear Sketching
(
Poster
)
link »
Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our algorithms are information-theoretically optimal for a broad class of interactive schemes. The results show that the linear sketching constraint does increase the communication cost for both tasks by introducing an extra linear dependence on the number of users in a round. Moreover, our results also establish a separation between the communication cost for heavy hitter discovery and approximate histogram in the multi-round setting. The dependence on the number of rounds $R$ is at most logarithmic for heavy hitter discovery whereas that of approximate histogram is $\Theta(\sqrt{R})$. We also empirically demonstrate our findings.
|
Adria Gascon · Peter Kairouz · Ziteng Sun · Ananda Suresh 🔗 |
-
|
Exact Optimality in Communication-Privacy-Utility Tradeoffs
(
Poster
)
link »
We study the mean estimation problem under communication and local differential privacy constraints. While previous work has proposed order-optimal algorithms for the same problem (i.e., asymptotically optimal as we spend more bits), exact optimality (in the non-asymptotic setting) still has not been achieved. We take a step towards characterizing the exact-optimal approach in the presence of shared randomness and identify several necessary conditions for exact optimality. We prove that one of the necessary conditions is to utilize a rotationally symmetric shared random codebook. Based on this, we propose a randomization mechanism where the codebook is a randomly rotated simplex -- satisfying the necessary properties of the exact-optimal codebook. The proposed mechanism is based on a $k$-closest encoding which we prove to be exact-optimal for the randomly rotated simplex codebook.
|
Berivan Isik · Wei-Ning Chen · Ayfer Ozgur · Tsachy Weissman · Albert No 🔗 |
-
|
Guiding The Last Layer in Federated Learning with Pre-Trained Models
(
Poster
)
link »
Federated Learning (FL) is an emerging paradigm that enables a model to be trained across a number of participants without sharing data. Recent works have begun to consider the effects of using pre-trained models as an initialization point for existing FL algorithms; however, these approaches ignore the vast body of efficient transfer learning literature from the centralized learning setting. Here we revisit the problem of FL from a pre-trained model considered in prior work and expand it to a set of computer vision transfer learning problems. We first observe that simply fitting a linear classification head can be efficient and effective in many cases. We then show that in the FL setting, fitting a classifier using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals, while obtaining strong performance. Finally, we demonstrate that using a two-phase approach of obtaining the classifier and then fine-tuning the model can yield rapid convergence and improved generalization in the federated setting. We demonstrate the potential our method has to reduce communication and compute costs while achieving better model performance. |
Gwen Legate · Nicolas Bernier · Lucas Caccia · Edouard Oyallon · Eugene Belilovsky 🔗 |
-
|
Green Federated Learning
(
Poster
)
link »
The amount of compute used in training state-of-the-art models is exponentially increasing (doubling every 10 months between 2015 and 2022), resulting in a large carbon footprint. Federated Learning (FL) can also be resource-intensive and have a significant carbon footprint, particularly when deployed at scale. Unlike centralized AI that can reliably tap into renewables at strategically placed data centers, cross-device FL may leverage as many as hundreds of millions of globally distributed end-user devices with diverse energy sources. Green AI is a novel and important research area where carbon footprint is regarded as an evaluation criterion for AI, alongside accuracy, convergence speed, and other metrics.In this paper, we propose the concept of Green FL, which involves optimizing FL parameters and making design choices to minimize carbon emissions consistent with competitive performance and training time. First, we adopt a data-driven approach to quantify the carbon emissions of FL by directly measuring real-world at-scale FL tasks running on millions of phones. Second, we present challenges, guidelines, and lessons learned from studying the trade-off between energy efficiency, performance, and time-to-train in a production FL system. |
Ashkan Yousefpour · Shen Guo · Ashish Shenoy · Sayan Ghosh · Pierre Stock · Kiwan Maeng · Schalk-Willem Krüger · Michael Rabbat · Carole-Jean Wu · Ilya Mironov 🔗 |
-
|
Adaptive Federated Learning with Auto-Tuned Clients
(
Poster
)
link »
Federated learning (FL) is a distributed machine learning framework where the global model of a central server is trained via multiple collaborative steps by participating clients without sharing their data. While being a flexible framework, where the distribution of local data, participation rate, and computing power of each client can greatly vary, such flexibility gives rise to many new challenges, especially in the hyperparameter tuning on both the server and the client side. We propose $\Delta$-SGD, a simple step size rule for SGD that enables each client to use its own step size by adapting to the local smoothness of the function each client is optimizing. We provide empirical results where the benefit of the client adaptivity is shown in various FL scenarios. In particular, our proposed method achieves TOP-1 accuracy in 73\% and TOP-2 accuracy in 100\% of the experiments considered without additional tuning.
|
J. Lyle Kim · Mohammad Taha Toghani · Cesar Uribe · Anastasios Kyrillidis 🔗 |
-
|
Federated, Fast, and Private Visualization of Decentralized Data
(
Poster
)
link »
Data visualization is an important step in many machine learning applications, as it allows for detecting outliers and discovering latent structure within data samples.In high-dimensional settings, visualization can be performed by embedding the samples into a low-dimensional space.There are several existing methods that do this embedding efficiently, but many of them rely on the assumption that all the data are locally available. In order to use such methods in a distributed setting, one would have to pool all of the datasets into a single site. However, in many domains, communication overhead and privacy concerns often preclude aggregating data from different data sources. To overcome this issue, we previously proposed decentralized Stochastic Neighbouring Embedding (dSNE), where one can embed high-dimensional data to a low-dimensional space in a decentralized manner.Yet, the dSNE algorithm still presents a couple challenges. Since dSNE communicates in an iterative manner, communication overhead may still be high. In addition, privacy is not formally guaranteed. In this paper, we introduce Faster AdaCliP dSNE (F-dSNE) that reduces communication among sites while satisfying $(\epsilon, \delta)$-differential privacy. Our experiments on four multi-site neuroimaging datasets demonstrate that we can still obtain promising results while addressing these remaining challenges.
|
Debbrata Kumar Saha · Vince Calhoun · Soo Min Kwon · Anand Sarwate · Rekha Saha · Sergey Plis 🔗 |
-
|
Private Federated Learning with Dynamic Power Control via Non-Coherent Over-the-Air Computation
(
Poster
)
link »
To further preserve model weight privacy and improve model performance in Federated Learning (FL), FL via Over-the-Air Computation (AirComp) scheme based on dynamic power control is proposed. The edge devices transmit the signs of local stochastic gradients by activating two adjacent orthogonal frequency division multiplexing (OFDM) subcarriers, and majority votes (MVs) at the edge server are obtained by exploiting the energy accumulation on the subcarriers. Then, we propose a dynamic power control algorithm to further offset the biased aggregation of MV aggregation values. We show that the whole scheme can mitigate the impact of the time synchronization error, channel fading, and noise. The convergence of the scheme is theoretically proved. |
Anbang Zhang · Shuaishuai Guo · Shuai Liu 🔗 |
-
|
A Joint Training-Calibration Framework for Test-Time Personalization with Label Distribution Shift in Federated Learning
(
Poster
)
link »
The data heterogeneity has been a challenging issue in federated learning in both training and inference stages, which motivates a variety of approaches to learn either personalized models for participating clients or test-time adaptations for unseen clients. One such approach is employing a shared feature representation and a customized classifier head for each client. However, previous works either do not utilize the global head with rich knowledge or assume the new clients have enough labeled data, which significantly limit their broader practicality. In this work, we propose a lightweight framework to tackle the label shift issue in model deployment by test priors estimation and model prediction calibration. We emphasize the importance of training a balanced global model in FL and the general effectiveness of prior estimation approaches. Numerical evaluation results on benchmark datasets with various label distribution shift cases demonstrate the superiority of our proposed framework. |
Jian Xu · Shao-Lun Huang 🔗 |
-
|
Learning-augmented private algorithms for multiple quantile release
(
Poster
)
link »
When applying differential privacy to sensitive data, we can often improve performance using external information such as other sensitive data, public data, or human priors. We propose to use the learning-augmented algorithms (or algorithms with predictions) framework---previously applied largely to improve time complexity or competitive ratios---as a powerful way of designing and analyzing privacy-preserving methods that can take advantage of such external information to improve utility. This idea is instantiated on the important task of multiple quantile release, for which we derive error guarantees that scale with a natural measure of prediction quality while (almost) recovering state-of-the-art prediction-independent guarantees. Our analysis enjoys several advantages, including minimal assumptions about the data, a natural way of adding robustness, and the provision of useful surrogate losses for two novel ''meta'' algorithms that learn predictions from other (potentially sensitive) data. We conclude with experiments on challenging tasks demonstrating that learning predictions across one or more instances can lead to large error reductions while preserving privacy. |
Mikhail Khodak · Kareem Amin · Travis Dick · Sergei Vassilvitskii 🔗 |
-
|
Leveraging Side Information for Communication-Efficient Federated Learning
(
Poster
)
link »
The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has \emph{side information} in the form of a pre-data distribution $p_{\theta}$ that is close to the client's distribution $q_{\phi^{(n)}}$ \emph{in Kullback–Leibler (KL) divergence}. We exploit this \emph{closeness} between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks such as FedPM, Federated SGLD, and QSGD to attain the same (and often higher) test accuracy with up to $50$ times reduction in the bitrate.
|
Berivan Isik · Francesco Pase · Deniz Gunduz · Sanmi Koyejo · Tsachy Weissman · Michele Zorzi 🔗 |
-
|
On Differentially Private Federated Linear Contextual Bandits
(
Poster
)
link »
We consider cross-silo federated linear contextual bandit (LCB) problem under differential privacy, where multiple silos (agents) interact with the local users and communicate via a central server to realize collaboration while without sacrificing each user's privacy. We identify three issues in the state-of-the-art: (i) failure of claimed privacy protection and (ii) incorrect regret bound due to noise miscalculation and (iii) ungrounded communication cost. To resolve these issues, we take a two-step principled approach. First, we design an algorithmic framework consisting of a generic federated LCB algorithm and flexible privacy protocols. Then, leveraging the proposed framework, we study federated LCBs under two different privacy constraints. Specifically, we first establish performance guarantees under silo-level local differential privacy, which fix the issues present in state-of-the-art algorithm. To further improve the regret performance, we next consider shuffle model of differential privacy, under which we show that our algorithm can achieve nearly ``optimal'' regret without a trusted central server. |
Xingyu Zhou · Sayak Ray Chowdhury 🔗 |
-
|
Distributed Mean Estimation for Multi-Message Shuffled Privacy
(
Poster
)
link »
In this paper, we study distributed mean estimation (DME) under privacy and communication constraints in the multi-message shuffle model. We propose communication-efficient algorithms for privately estimating the mean of bound $\ell_2$-norm and $\ell_{\infty}$-norm norm vectors. Our algorithms are designed by giving unequal privacy at different resolutions of the vector (through binary expansion) and appropriately combining it with co-ordinate sampling. We show that our proposed algorithms achieve order-optimal privacy-communication-performance trade-offs.
|
Antonious Girgis · Suhas Diggavi 🔗 |
-
|
Federated Ensemble-Directed Offline Reinforcement Learning
(
Poster
)
link »
We consider the problem of federated offline reinforcement learning (RL), where clients must collaboratively learn a control policy only using data collected using unknown behavior policies. Naively combining a standard offline RL approach with a standard federated learning approach to solve this problem can lead to poorly performing policies. We develop Federated Ensemble-Directed Offline Reinforcement Learning Algorithm (FEDORA), which distills the collective wisdom of the clients using an ensemble learning approach. We show that FEDORA significantly outperforms other approaches, including offline RL over the combined data pool, in various complex continuous control and real-world environments. |
Desik Rengarajan · Nitin Ragothaman · Dileep Kalathil · Srinivas Shakkottai 🔗 |
-
|
Federated Learning with Regularized Client Participation
(
Poster
)
link »
Federated Learning (FL) is a distributed machine learning approach where multiple clients work together to solve a machine learning task. One of the key challenges in FL is the issue of partial participation, which occurs when a large number of clients are involved in the training process. The traditional method to address this problem is randomly selecting a subset of clients at each communication round. In our research, we propose a new technique and design a novel regularized client participation scheme. Under this scheme, each client joins the learning process every $R$ communication rounds, which we refer to as a meta epoch. We have found that this participation scheme leads to a reduction in the variance caused by client sampling. Combined with the popular FedAvg algorithm (McMahan et al., 2017), it results in superior rates under standard assumptions. For instance, the optimization term in our main convergence bound decreases linearly with the product of the number of communication rounds and the size of the local dataset of each client, and the statistical term scales with step size quadratically instead of linearly (the case for client sampling with replacement), leading to better convergence rate $\mathcal{O}\left(1 / T^2\right)$ compared to $\mathcal{O}(1 / T)$, where $T$ is the total number of communication rounds. Furthermore, our results permit arbitrary client availability as long as each client is available for training once per each meta epoch. Finally, we corroborate our results with experiments.
|
Grigory Malinovsky · Samuel Horváth · Konstantin Burlachenko · Peter Richtarik 🔗 |
-
|
Federated Optimization Algorithms with Random Reshuffling and Gradient Compression
(
Poster
)
link »
Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However, the existing works consider only with-replacement sampling of stochastic gradients. In contrast, it is well-known in practice and recently confirmed in theory that stochastic methods based on without-replacement sampling, e.g., Random Reshuffling (RR) method, perform better than ones that sample the gradients with-replacement. In this work, we close this gap in the literature and provide the first analysis of methods with gradient compression and without-replacement sampling. We first develop a distributed variant of random reshuffling with gradient compression (Q-RR), and show how to reduce the variance coming from gradient quantization through the use of control iterates. Next, to have a better fit to Federated Learning applications, we incorporate local computation and propose a variant of Q-RR called Q-NASTYA. Q-NASTYA uses local gradient steps and different local and global stepsizes. Next, we show how to reduce compression variance in this setting as well. Finally, we prove the convergence results for the proposed methods and outline several settings in which they improve upon existing algorithms. |
Abdurakhmon Sadiev · Grigory Malinovsky · Eduard Gorbunov · Igor Sokolov · Ahmed Khaled · Konstantin Burlachenko · Peter Richtarik 🔗 |
-
|
Momentum Provably Improves Error Feedback!
(
Poster
)
link »
Due to the high communication overhead when training machine learning models in a distributed environment, modern algorithms invariably rely on lossy communication compression. However, when untreated, the errors caused by compression propagate, and can lead to severely unstable behavior, including exponential divergence. Almost a decade ago, Seide et al. (2014) proposed an error feedback (EF) mechanism, which we refer to as EF14, as an immensely effective heuristic for mitigating this issue. However, despite steady algorithmic and theoretical advances in the EF field in the last decade, our understanding is far from complete. In this work we address one of the most pressing issues. In particular, in the canonical nonconvex setting, all known variants of EF rely on very large batch sizes to converge, which can be prohibitive in practice. We propose a surprisingly simple fix which removes this issue both theoretically, and in practice: the application of Polyak's momentum to the latest incarnation of EF due to Richtárik et al. (2021) known as EF21. Our algorithm, for which we coin the name EF21-SGDM, improves the communication and sample complexities of previous error feedback algorithms under standard smoothness and bounded variance assumptions, and does not require any further strong assumptions such as bounded gradient dissimilarity. Moreover, we propose a double momentum version of our method that improves the complexities even further. Our proof seems to be novel even when compression is removed from the method, and as such, our proof technique is of independent interest in the study of nonconvex stochastic optimization enriched with Polyak's momentum. |
Ilyas Fatkhullin · Alexander Tyurin · Peter Richtarik 🔗 |
-
|
Privacy-Preserving Federated Heavy Hitter Analytics for Non-IID Data
(
Poster
)
link »
Federated heavy hitter analytics involves the identification of the most frequent items within distributed data. Existing methods for this task often encounter challenges such as compromising privacy or sacrificing utility. To address these issues, we introduce a novel privacy-preserving algorithm that exploits the hierarchical structure to discover local and global heavy hitters in non-IID data by utilizing perturbation and similarity techniques. We conduct extensive evaluations on both synthetic and real datasets to validate the effectiveness of our approach. We also present FedCampus, a demonstration application to showcase the capabilities of our algorithm in analyzing population statistics. |
Jiaqi Shao · Shanshan Han · Chaoyang He · Bing Luo 🔗 |
-
|
SCAFF-PD: Communication Efficient Fair and Robust Federated Learning
(
Poster
)
link »
We present SCAFF-PD, a fast and communication-efficient algorithm for distributionally robust federated learning. Our approach improves fairness by optimizing a family of distributionally robust objectives tailored to heterogeneous clients. We leverage the special structure of these objectives, and design an accelerated primal dual (APD) algorithm which uses bias corrected local steps (as in {\sc Scaffold}) to achieve significant gains in communication efficiency and convergence speed. We evaluate SCAFF-PD on several benchmark datasets and demonstrate its effectiveness in improving fairness and robustness while maintaining competitive accuracy. Our results suggest that SCAFF-PD is a promising approach for federated learning in resource-constrained and heterogeneous settings. |
Yaodong Yu · Sai Praneeth Karimireddy · Yi Ma · Michael Jordan 🔗 |
-
|
ELF: Federated Langevin Algorithms with Primal, Dual and Bidirectional Compression
(
Poster
)
link »
Federated sampling algorithms have recently gained great popularity in the community of machine learning and statistics. This paper studies variants of such algorithms called Error Feedback Langevin algorithms (ELF). In particular, we analyze the combinations of EF21 and EF21-P with the federated Langevin Monte-Carlo. We propose three algorithms: P-ELF, D-ELF, and B-ELF that use, respectively, primal, dual, and bidirectional compressors. We analyze the proposed methods under Log-Sobolev inequality and provide non-asymptotic convergence guarantees. |
Avetik Karagulyan · Peter Richtarik 🔗 |
-
|
Differentially Private Heavy Hitters using Federated Analytics
(
Poster
)
link »
We study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. |
Karan Chadha · Junye Chen · John Duchi · Vitaly Feldman · Hanieh Hashemi · Omid Javidbakht · Audra McMillan · Kunal Talwar 🔗 |
-
|
Federated Conformal Predictors for Distributed Uncertainty Quantification
(
Poster
)
link »
Conformal prediction is a popular paradigm for providing rigorous uncertainty quantification that can be applied to already trained models. We present an extension of conformal prediction to federated learning. The main challenge is data heterogeneity across the clients, which violates the fundamental tenet of \emph{exchangeability} required for conformal prediction. Instead, we propose a weaker notion of \emph{partial exchangeability} which is better suited to the FL setting, and use it to develop the Federated Conformal Prediction (FCP) framework. We show FCP enjoys rigorous theoretical guarantees as well as excellent empirical performance on several computer vision and medical imaging datasets. Our results demonstrate a practical approach to incorporating meaningful uncertainty quantification in distributed and heterogeneous environments. |
Charles Lu · Yaodong Yu · Sai Praneeth Karimireddy · Michael Jordan · Ramesh Raskar 🔗 |
-
|
A Convergent Federated Clustering Algorithm without Initial Condition
(
Poster
)
link »
In this paper, we define a new clustering framework for FL based on the (optimal) local models of the users: two users belong to the same cluster if their local models are close. We propose an algorithm, \emph{Successive Refine Federated Clustering Algorithm} (\texttt{SR-FCA}), that treats each user as a singleton cluster as an initialization, and then successively refine the cluster estimation via exploiting similarity with other users. In any intermediate step, \texttt{SR-FCA} uses an {\em error-tolerant} federated learning algorithm within each cluster to exploit simultaneous training and to correct clustering errors. Unlike some prominent prior works, such as ~\cite{ghoshefficient2021}, \texttt{SR-FCA} does not require any \emph{good} initialization (or warm start), both in theory and practice. We show that with proper choice of learning rate, \texttt{SR-FCA} incurs arbitrarily small clustering error. Additionally, \texttt{SR-FCA} does not require the knowledge of the number of clusters apriori like some prior works. We also validate the performance of our algorithm on real-world FL datasets including FEMNIST and Shakespeare in non-convex problems and show the benefits of \texttt{SR-FCA} over several baselines. |
Harsh Vardhan · Avishek Ghosh · Arya Mazumdar 🔗 |
-
|
Sketch-and-Project Meets Newton Method: \\ Global $\mathcal O \left( k^{-2} \right)$ Convergence with Low-Rank Updates
(
Poster
)
link »
In this paper, we propose the first sketch-and-project Newton method with fast $\mathcal O \left( k^{-2} \right)$ global convergence rate while using low-rank updates. Our method, SGN, can be viewed in three ways: i) as a sketch-and-project algorithm projecting updates of Newton method, ii) as a cubically regularized Newton method in sketched subspaces, and iii) as a damped Newton method in sketched subspaces. SGN inherits best of all three worlds: cheap iteration costs of sketch-and-project methods (up to $\mathcal O(1)$), state-of-the-art $\mathcal O \left( k^{-2} \right)$ global convergence rate of full-rank Newton-like methods and the algorithm simplicity of damped Newton methods. Finally, we demonstrate its comparable empirical performance to baseline algorithms.
|
Slavomír Hanzely 🔗 |
-
|
Towards a Better Theoretical Understanding of Independent Subnetwork Training
(
Poster
)
link »
Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, a lot of recent research was directed towards co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models rely on some form of model parallelism as well. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model. |
Egor Shulgin · Peter Richtarik 🔗 |
-
|
Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes
(
Poster
)
link »
In this work, we consider the problem of minimizing the sum of Moreau envelopes of given functions, which has previously appeared in the context of meta-learning and personalized federated learning. In contrast to the existing theory that requires running subsolvers until a certain precision is reached, we only assume that a finite number of gradient steps is taken at each iteration. As a special case, our theory allows us to show the convergence of First-Order Model-Agnostic Meta-Learning (FO-MAML) to the vicinity of a solution of Moreau objective. We also study a more general family of first-order algorithms that can be viewed as a generalization of FO-MAML. Our main theoretical achievement is a theoretical improvement upon the inexact SGD framework. In particular, our perturbed-iterate analysis allows for tighter guarantees that improve the dependency on the problem's conditioning. In contrast to the related work on meta-learning, ours does not require any assumptions on the Hessian smoothness, and can leverage smoothness and convexity of the reformulation based on Moreau envelopes. Furthermore, to fill the gaps in the comparison of FO-MAML to the Implicit MAML (iMAML), we show that the objective of iMAML is neither smooth nor convex, implying that it has no convergence guarantees based on the existing theory. |
Konstantin Mishchenko · Slavomír Hanzely · Peter Richtarik 🔗 |
-
|
Strategic Data Sharing between Competitors
(
Poster
)
link »
Collaborative learning techniques have significantly advanced in recent years, enabling private model training across multiple organizations. Despite this opportunity, firms face a dilemma when considering data sharing with competitors—while collaboration can improve a company’s machine learning model, it may also benefit competitors and hence reduce profits. In this work, we introduce a general framework for analyzing this data-sharing trade-off. The framework consists of three components, representing the firms’ production decisions, the effect of additional data on model quality, and the data-sharing negotiation process, respectively. We then study an instantiation of the framework, based on a conventional market model from economic theory, to identify key factors that affect collaboration incentives. Our findings indicate a profound impact of market conditions on the data-sharing incentives. In particular, we find that reduced competition, in terms of the similarities between the firms’ products, and harder learning tasks foster collaboration. |
Nikita Tsoy · Nikola Konstantinov 🔗 |
-
|
Evaluating and Incentivizing Diverse Data Contributions in Collaborative Learning
(
Poster
)
link »
For a federated learning model to perform well, it is crucial to have a diverse and representative dataset. However, the data contributors may only be concerned with the performance on a specific subset of the population, which may not reflect the diversity of the wider population. This creates a tension between the principal (the FL platform designer) who cares about global performance and the agents (the data collectors) who care about local performance and will hence only collect locally useful data. In this work, we formulate this tension as a game between the principal and multiple agents, and focus on the linear experiment design problem to formally study their interaction. We show that the statistical criterion used to quantify the diversity of the data, as well as the choice of the federated learning algorithm used, has a significant effect on the resulting equilibrium. We leverage this to design simple optimal federated learning mechanisms that encourage data collectors to contribute data representative of the global population, thereby maximizing global performance. |
Baihe Huang · Sai Praneeth Karimireddy · Michael Jordan 🔗 |
-
|
Clustering-Guided Federated Learning of Representations
(
Poster
)
link »
Federated self-supervised learning (FedSSL) methods have proven to be very useful in learning unlabeled data that is distributed to multiple clients, possibly heterogeneously. However, there is still a lot of room for improvement for FedSSL methods, especially for the case of highly heterogeneous data and a large number of classes. In this paper, we introduce federated representation learning through clustering (FedRLC) scheme that utilizes i) a crossed KL divergence loss with a data selection strategy during local training and ii) a dynamic upload on local cluster centers during communication updates. Experimental results show that FedRLC achieves state-of-the-art results on widely used benchmarks even with highly heterogeneous settings and datasets with a large number of classes such as CIFAR-100. |
Runxuan Miao · Erdem Koyuncu 🔗 |
-
|
Concept-aware clustering for decentralized deep learning under temporal shift
(
Poster
)
link »
Decentralized deep learning requires dealing with non-iid data across clients, which may also change over time due to temporal shifts. While non-iid data has been extensively studied in distributed settings, temporal shifts have received no attention. To the best of our knowledge, we are first with tackling the novel and challenging problem of decentralized learning with non-iid and dynamic data. We propose a novel algorithm that can automatically discover and adapt to the evolving concepts in the network, without any prior knowledge or estimation of the number of concepts. We evaluate our algorithm on standard benchmark datasets and demonstrate that it outperforms previous methods for decentralized learning. |
Edvin Listo Zec · Emilie Klefbom · Marcus Toftås · Martin Willbo · Olof Mogren 🔗 |
-
|
Randomized Quantization is All You Need for Differential Privacy in Federated Learning
(
Poster
)
link »
Federated learning (FL) is a common and practical framework for learning a machine model in a decentralized fashion. A primary motivation behind this decentralized approach is data privacy, ensuring that the learner never sees the data of each local source itself. Federated learning then comes with two majors challenges: one is handling potentially complex model updates between a server and a large number of data sources; the other is that de-centralization may, in fact, be insufficient for privacy, as the local updates themselves can reveal information about the sources' data. To address these issues, we consider an approach to federated learning that combines quantization and differential privacy. Absent privacy, Federated Learning often relies on quantization to reduce communication complexity. We build upon this approach and develop a new algorithm called the \textbf{R}andomized \textbf{Q}uantization \textbf{M}echanism (RQM), which obtains privacy through a two-levels of randomization. More precisely, we randomly sub-sample feasible quantization levels, then employ a randomized rounding procedure using these sub-sampled discrete levels. We are able to establish that our results preserve ``Renyi differential privacy'' (Renyi DP). We empirically study the performance of our algorithm and demonstrate that compared to previous work it yields improved privacy-accuracy trade-offs for DP federated learning. To the best of our knowledge, this is the first study that solely relies on randomized quantization without incorporating explicit discrete noise to achieve Renyi DP guarantees in Federated Learning systems. |
Yeojoon Youn · Zihao Hu · Juba Ziani · Jacob Abernethy 🔗 |
-
|
Privacy Auditing with One (1) Training Run
(
Poster
)
link »
We propose a scheme for auditing differentially private machine learning systems with a single training run. This exploits the parallelism of being able to add or remove multiple training examples independently. We analyze this using the connection between differential privacy and statistical generalization, which avoids the cost of group privacy. Our auditing scheme requires minimal assumptions about the algorithm and can be applied in the black-box (i.e., central DP) or white-box (i.e., federated learning) setting. We demonstrate the effectiveness of our framework by applying it to DP-SGD, where we can achieve meaningful empirical privacy lower bounds by training only one model, where standard methods would require training hundreds of models. |
Thomas Steinke · Milad Nasresfahani · Matthew Jagielski 🔗 |
-
|
On the Performance of Gradient Tracking with Local Updates
(
Poster
)
link »
We study the decentralized optimization problem where a network of $n$ agents seeks to minimize the average of a set of heterogeneous non-convex cost functions distributedly. State-of-the-art decentralized algorithms like Exact Diffusion and Gradient Tracking~(GT) involve communicating every iteration. However, communication is expensive, resource intensive, and slow. This work analyzes a locally updated GT method (LU-GT), where agents perform local recursions before interacting with their neighbors. While local updates have been shown to reduce communication overhead in practice, their theoretical influence has not been fully characterized. We show LU-GT has the same communication complexity as the Federated Learning setting but allows for decentralized (symmetric) network topologies and prove that the number of local updates does not degrade the quality of the solution achieved by LU-GT.
|
Edward Duc Hien Nguyen · Sulaiman Alghunaim · Kun Yuan · Cesar Uribe 🔗 |
-
|
Resource-Efficient Federated Learning
(
Poster
)
link »
Federated Learning (FL) is a distributed training paradigm that avoids sharing the users’ private data. FL has presented unique challenges in dealing with data, device and user heterogeneity which impact both model quality and training time. The impact is exacerbated by the scale of the deployments. More importantly, existing FL methods result in inefficient use of resources and prolonged training times. In this work, we propose, REFL, to systematically address the question of resource efficiency in FL, showing the benefits of intelligent participant selection, and incorporation of updates from straggling participants. REFL is a resource-efficient federated learning system that maximizes FL systems’ resource efficiency without compromising statistical and system efficiency. REFL is released as open source at https://github.com/ahmedcs/REFL. |
Ahmed M. Abdelmoniem · Atal Sahu · Marco Canini · Suhaib Fahmy 🔗 |
-
|
$\texttt{FED-CURE}$: A Robust Federated Learning Algorithm with Cubic Regularized Newton
(
Poster
)
link »
In this paper, we analyze the cubic-regularized Newton method that avoids saddle points in non-convex optimization in the Federated Learning (FL) framework and simultaneously address several practical challenges that naturally arise in FL, like communication bottleneck and Byzantine attacks. We propose FEDerated CUbic REgularized Newton $(\texttt{FED-CURE})$ and obtain convergence guarantees under several settings. Being a second order algorithm, the iteration complexity of $\texttt{FED-CURE}$ is much lower than its first order counterparts, and furthermore we can use compression (or sparsification) techniques like $\delta$-approximate compression to achieve communication efficiency and norm-based thresholding for Byzantine resilience. We validate the performance of $\texttt{FED-CURE}$ with experiments using standard datasets and several types of Byzantine attacks, and obtain an improvement of $25\%$ with respect to first order methods in total iteration complexity.
|
Avishek Ghosh · Raj Kumar Maity · Arya Mazumdar 🔗 |
-
|
Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data
(
Poster
)
link »
Decentralized learning algorithms enable the training of deep learning models over large distributed datasets, without the need for a central server. In practical scenarios, the distributed datasets can have significantly different data distributions across the agents. In this paper, we propose Neighborhood Gradient Clustering (NGC), a novel decentralized learning algorithm to improve decentralized learning over non-IID data. Specifically, the proposed method replaces the local gradients of the model with the weighted mean of self-gradients, model-variant cross-gradients, and data-variant cross-gradients. Model-variant cross-gradients are derivatives of the received neighbors’ model parameters with respect to the local dataset - computed locally. Data-variant cross-gradients are derivatives of the local model with respect to its neighbors’ datasets - received through communication. We demonstrate the efficiency of \textit{NGC} over non-IID data sampled from various vision datasets. Our experiments demonstrate that the proposed method either remains competitive or outperforms (by up to 6%) the existing state-of-the-art (SoTA) with significantly less compute and memory requirements. |
Sai Aparna Aketi · Sangamesh Kodge · Kaushik Roy 🔗 |
-
|
Asynchronous Federated Learning with Bidirectional Quantized Communications and Buffered Aggregation
(
Poster
)
link »
Asynchronous Federated Learning with Buffered Aggregation (FedBuff) is a state-of-the-art algorithm known for its efficiency and high scalability.However, it has a high communication cost, which has not been examined with quantized communications.To tackle this problem, we present a new algorithm (QAFeL), with a quantization scheme that establishes a shared "hidden'' state between the server and clients to avoid the error propagation caused by direct quantization.This approach allows for high precision while significantly reducing the data transmitted during client-server interactions.We provide theoretical convergence guarantees for QAFeLand corroborate our analysis with experiments on a standard benchmark. |
Tomas Ortega · Hamid Jafarkhani 🔗 |
-
|
Population Expansion for Training Language Models with Private Federated Learning
(
Poster
)
link »
Federated learning (FL) combined with differential privacy (DP) offers machine learning (ML) training with distributed devices and with a formal privacy guarantee.With a large population of devices, FL with DP produces a performant model in a timely manner. However, for applications with a smaller population, not only does the model utility degrade as the DP noise is inversely proportional to population, but also the training latency increases since waiting for enough clients to become available from a smaller pool is slower.In this work, we thus propose expanding the population based on domain adaptation techniques to speed up the training and improves the final model quality when training with small populations.We empirically demonstrate that our techniques can improve the utility by 13\% to 30\% on real-world language modeling datasets. |
Tatsuki Koga · Congzheng Song · Martin Pelikan · Mona Chitnis 🔗 |
Author Information
Zheng Xu (Google Research)
Peter Kairouz (Google)
Bo Li (UIUC)

Dr. Bo Li is an assistant professor in the Department of Computer Science at the University of Illinois at Urbana–Champaign. She is the recipient of the IJCAI Computers and Thought Award, Alfred P. Sloan Research Fellowship, AI’s 10 to Watch, NSF CAREER Award, MIT Technology Review TR-35 Award, Dean's Award for Excellence in Research, C.W. Gear Outstanding Junior Faculty Award, Intel Rising Star award, Symantec Research Labs Fellowship, Rising Star Award, Research Awards from Tech companies such as Amazon, Facebook, Intel, IBM, and eBay, and best paper awards at several top machine learning and security conferences. Her research focuses on both theoretical and practical aspects of trustworthy machine learning, which is at the intersection of machine learning, security, privacy, and game theory. She has designed several scalable frameworks for trustworthy machine learning and privacy-preserving data publishing. Her work has been featured by major publications and media outlets such as Nature, Wired, Fortune, and New York Times.
Tian Li (Carnegie Mellon University)
John Nguyen (Meta)
Jianyu Wang (Apple)
Shiqiang Wang (IBM Research)
Ayfer Ozgur (Stanford University)
More from the Same Authors
-
2021 : Neural Network-based Estimation of the MMSE »
Mario Diaz · Peter Kairouz · Lalitha Sankar -
2021 : Local Adaptivity in Federated Learning: Convergence and Consistency »
Jianyu Wang · Zheng Xu · Luyang Liu -
2021 : The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation »
Peter Kairouz · Ziyu Liu · Thomas Steinke -
2021 : On the Renyi Differential Privacy of the Shuffle Model »
Antonious Girgis · Deepesh Data · Suhas Diggavi · Ananda Theertha Suresh · Peter Kairouz -
2021 : Practical and Private (Deep) Learning without Sampling orShuffling »
Peter Kairouz · Hugh B McMahan · Shuang Song · Om Dipakbhai Thakkar · Abhradeep Guha Thakurta · Zheng Xu -
2021 : Industrial Booth (IBM) »
Shiqiang Wang · Nathalie Baracaldo -
2021 : Industrial Booth (Google) »
Zheng Xu · Peter Kairouz -
2022 : Fair Universal Representations using Adversarial Models »
Monica Welfert · Peter Kairouz · Jiachun Liao · Chong Huang · Lalitha Sankar -
2022 : Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables »
Mengdi Xu · Peide Huang · Visak Kumar · Jielin Qiu · Chao Fang · Kuan-Hui Lee · Xuewei Qi · Henry Lam · Bo Li · Ding Zhao -
2022 : Paper 10: CausalAF: Causal Autoregressive Flow for Safety-Critical Scenes Generation »
Wenhao Ding · Haohong Lin · Bo Li · Ding Zhao · Hitesh Arora -
2023 : DiffScene: Diffusion-Based Safety-Critical Scenario Generation for Autonomous Vehicles »
Chejian Xu · Ding Zhao · Alberto Sngiovanni Vincentelli · Bo Li -
2023 : Towards a Theoretical and Practical Understanding of One-Shot Federated Learning with Fisher Information »
Divyansh Jhunjhunwala · Shiqiang Wang · Gauri Joshi -
2023 : Federated Experiment Design under Distributed Differential Privacy »
Wei-Ning Chen · Graham Cormode · Akash Bharadwaj · Peter Romov · Ayfer Ozgur -
2023 : Unleashing the Power of Randomization in Auditing Differentially Private ML »
Krishna Pillutla · Galen Andrew · Peter Kairouz · Hugh B McMahan · Alina Oprea · Sewoong Oh -
2023 : Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation »
Wei-Ning Chen · Dan Song · Ayfer Ozgur · Peter Kairouz -
2023 : A New Theoretical Perspective on Data Heterogeneity in Federated Optimization »
Jiayi Wang · Shiqiang Wang · Rong-Rong Chen · Mingyue Ji -
2023 : Federated Heavy Hitter Recovery under Linear Sketching »
Adria Gascon · Peter Kairouz · Ziteng Sun · Ananda Suresh -
2023 : Exact Optimality in Communication-Privacy-Utility Tradeoffs »
Berivan Isik · Wei-Ning Chen · Ayfer Ozgur · Tsachy Weissman · Albert No -
2023 : Semantically Adversarial Scene Generation with Explicit Knowledge Guidance for Autonomous Driving »
Wenhao Ding · Haohong Lin · Bo Li · Ding Zhao -
2023 : Can Public Large Language Models Help Private Cross-device Federated Learning? »
Boxin Wang · Yibo J. Zhang · Yuan Cao · Bo Li · Hugh B McMahan · Sewoong Oh · Zheng Xu · Manzil Zaheer -
2023 : Can Public Large Language Models Help Private Cross-device Federated Learning? »
Boxin Wang · Yibo J. Zhang · Yuan Cao · Bo Li · Hugh B McMahan · Sewoong Oh · Zheng Xu · Manzil Zaheer -
2023 : Local Differential Privacy with Entropic Wasserstein Distance »
Daria Reshetova · Wei-Ning Chen · Ayfer Ozgur -
2023 : Visual-based Policy Learning with Latent Language Encoding »
Jielin Qiu · Mengdi Xu · William Han · Bo Li · Ding Zhao -
2023 : Can Brain Signals Reveal Inner Alignment with Human Languages? »
Jielin Qiu · William Han · Jiacheng Zhu · Mengdi Xu · Douglas Weber · Bo Li · Ding Zhao -
2023 : Panel Discussion »
Peter Kairouz · Song Han · Kamalika Chaudhuri · Florian Tramer -
2023 Workshop: Knowledge and Logical Reasoning in the Era of Data-driven Learning »
Nezihe Merve Gürel · Bo Li · Theodoros Rekatsinas · Beliz Gunel · Alberto Sngiovanni Vincentelli · Paroma Varma -
2023 : Introduction and Opening Remarks »
Zheng Xu -
2023 Poster: Beyond Uniform Lipschitz Condition in Differentially Private Optimization »
Rudrajit Das · Satyen Kale · Zheng Xu · Tong Zhang · Sujay Sanghavi -
2023 Poster: UMD: Unsupervised Model Detection for X2X Backdoor Attacks »
Zhen Xiang · Zidi Xiong · Bo Li -
2023 Poster: Federated Heavy Hitter Recovery under Linear Sketching »
Adria Gascon · Peter Kairouz · Ziteng Sun · Ananda Suresh -
2023 Poster: On the Convergence of Federated Averaging with Cyclic Client Participation »
Yae Jee Cho · PRANAY SHARMA · Gauri Joshi · Zheng Xu · Satyen Kale · Tong Zhang -
2023 Poster: LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning »
Timothy Castiglia · Yi Zhou · Shiqiang Wang · Swanand Kadhe · Nathalie Baracaldo · Stacy Patterson -
2023 Poster: Interpolation for Robust Learning: Data Augmentation on Wasserstein Geodesics »
Jiacheng Zhu · Jielin Qiu · Aritra Guha · Zhuolin Yang · XuanLong Nguyen · Bo Li · Ding Zhao -
2023 Poster: Private Federated Learning with Autotuned Compression »
Enayat Ullah · Christopher Choquette-Choo · Peter Kairouz · Sewoong Oh -
2023 Poster: Algorithms for bounding contribution for histogram estimation under user-level privacy »
Yuhan Liu · Ananda Suresh · Wennan Zhu · Peter Kairouz · Marco Gruteser -
2023 Poster: Reconstructive Neuron Pruning for Backdoor Defense »
Yige Li · XIXIANG LYU · Xingjun Ma · Nodens Koren · Lingjuan Lyu · Bo Li · Yu-Gang Jiang -
2023 Tutorial: How to DP-fy ML: A Practical Tutorial to Machine Learning with Differential Privacy »
Sergei Vassilvitskii · Natalia Ponomareva · Zheng Xu -
2022 : Paper 15: On the Robustness of Safe Reinforcement Learning under Observational Perturbations »
Zuxin Liu · Zhepeng Cen · Huan Zhang · Jie Tan · Bo Li · Ding Zhao -
2022 Poster: Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data »
Timothy Castiglia · Anirban Das · Shiqiang Wang · Stacy Patterson -
2022 Poster: Constrained Variational Policy Optimization for Safe Reinforcement Learning »
Zuxin Liu · Zhepeng Cen · Vladislav Isenbaev · Wei Liu · Steven Wu · Bo Li · Ding Zhao -
2022 Poster: Provable Domain Generalization via Invariant-Feature Subspace Recovery »
Haoxiang Wang · Haozhe Si · Bo Li · Han Zhao -
2022 Spotlight: Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data »
Timothy Castiglia · Anirban Das · Shiqiang Wang · Stacy Patterson -
2022 Spotlight: Constrained Variational Policy Optimization for Safe Reinforcement Learning »
Zuxin Liu · Zhepeng Cen · Vladislav Isenbaev · Wei Liu · Steven Wu · Bo Li · Ding Zhao -
2022 Spotlight: Provable Domain Generalization via Invariant-Feature Subspace Recovery »
Haoxiang Wang · Haozhe Si · Bo Li · Han Zhao -
2022 Poster: How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection »
Mantas Mazeika · Bo Li · David Forsyth -
2022 Poster: The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning »
Wei-Ning Chen · Christopher Choquette Choo · Peter Kairouz · Ananda Suresh -
2022 Poster: Private Adaptive Optimization with Side information »
Tian Li · Manzil Zaheer · Sashank Jakkam Reddi · Virginia Smith -
2022 Poster: The Poisson Binomial Mechanism for Unbiased Federated Learning with Secure Aggregation »
Wei-Ning Chen · Ayfer Ozgur · Peter Kairouz -
2022 Poster: Adversarially Robust Models may not Transfer Better: Sufficient Conditions for Domain Transferability from the View of Regularization »
Xiaojun Xu · Yibo Zhang · Evelyn Ma · Hyun Ho Son · Sanmi Koyejo · Bo Li -
2022 Poster: Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond »
Haoxiang Wang · Bo Li · Han Zhao -
2022 Spotlight: Private Adaptive Optimization with Side information »
Tian Li · Manzil Zaheer · Sashank Jakkam Reddi · Virginia Smith -
2022 Spotlight: The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning »
Wei-Ning Chen · Christopher Choquette Choo · Peter Kairouz · Ananda Suresh -
2022 Oral: The Poisson Binomial Mechanism for Unbiased Federated Learning with Secure Aggregation »
Wei-Ning Chen · Ayfer Ozgur · Peter Kairouz -
2022 Spotlight: How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection »
Mantas Mazeika · Bo Li · David Forsyth -
2022 Spotlight: Adversarially Robust Models may not Transfer Better: Sufficient Conditions for Domain Transferability from the View of Regularization »
Xiaojun Xu · Yibo Zhang · Evelyn Ma · Hyun Ho Son · Sanmi Koyejo · Bo Li -
2022 Spotlight: Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond »
Haoxiang Wang · Bo Li · Han Zhao -
2022 Poster: Certifying Out-of-Domain Generalization for Blackbox Functions »
Maurice Weber · Linyi Li · Boxin Wang · Zhikuan Zhao · Bo Li · Ce Zhang -
2022 Poster: Double Sampling Randomized Smoothing »
Linyi Li · Jiawei Zhang · Tao Xie · Bo Li -
2022 Poster: TPC: Transformation-Specific Smoothing for Point Cloud Models »
Wenda Chu · Linyi Li · Bo Li -
2022 Spotlight: TPC: Transformation-Specific Smoothing for Point Cloud Models »
Wenda Chu · Linyi Li · Bo Li -
2022 Spotlight: Double Sampling Randomized Smoothing »
Linyi Li · Jiawei Zhang · Tao Xie · Bo Li -
2022 Spotlight: Certifying Out-of-Domain Generalization for Blackbox Functions »
Maurice Weber · Linyi Li · Boxin Wang · Zhikuan Zhao · Bo Li · Ce Zhang -
2021 : Closing Remarks »
Shiqiang Wang · Nathalie Baracaldo · Olivia Choudhury · Gauri Joshi · Peter Richtarik · Praneeth Vepakomma · Han Yu -
2021 : Discussion Panel #2 »
Bo Li · Nicholas Carlini · Andrzej Banburski · Kamalika Chaudhuri · Will Xiao · Cihang Xie -
2021 : Industrial Panel »
Nathalie Baracaldo · Shiqiang Wang · Peter Kairouz · Zheng Xu · Kshitiz Malik · Tao Zhang -
2021 Workshop: International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021 (FL-ICML'21) »
Nathalie Baracaldo · Olivia Choudhury · Gauri Joshi · Peter Richtarik · Praneeth Vepakomma · Shiqiang Wang · Han Yu -
2021 : Opening Remarks »
Shiqiang Wang · Nathalie Baracaldo · Olivia Choudhury · Gauri Joshi · Peter Richtarik · Praneeth Vepakomma · Han Yu -
2021 Workshop: A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning »
Hang Su · Yinpeng Dong · Tianyu Pang · Eric Wong · Zico Kolter · Shuo Feng · Bo Li · Henry Liu · Dan Hendrycks · Francesco Croce · Leslie Rice · Tian Tian -
2021 : Contributed Talks Session 1 »
Marika Swanberg · Samuel Haney · Peter Kairouz -
2021 Poster: Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability »
Kaizhao Liang · Yibo Zhang · Boxin Wang · Zhuolin Yang · Sanmi Koyejo · Bo Li -
2021 Poster: CRFL: Certifiably Robust Federated Learning against Backdoor Attacks »
Chulin Xie · Minghao Chen · Pin-Yu Chen · Bo Li -
2021 Poster: Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation »
Jiawei Zhang · Linyi Li · Huichen Li · Xiaolu Zhang · Shuang Yang · Bo Li -
2021 Poster: Practical and Private (Deep) Learning Without Sampling or Shuffling »
Peter Kairouz · Brendan McMahan · Shuang Song · Om Dipakbhai Thakkar · Abhradeep Guha Thakurta · Zheng Xu -
2021 Poster: The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation »
Peter Kairouz · Ziyu Liu · Thomas Steinke -
2021 Poster: Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation »
Haoxiang Wang · Han Zhao · Bo Li -
2021 Spotlight: Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation »
Jiawei Zhang · Linyi Li · Huichen Li · Xiaolu Zhang · Shuang Yang · Bo Li -
2021 Spotlight: The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation »
Peter Kairouz · Ziyu Liu · Thomas Steinke -
2021 Spotlight: Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability »
Kaizhao Liang · Yibo Zhang · Boxin Wang · Zhuolin Yang · Sanmi Koyejo · Bo Li -
2021 Spotlight: Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation »
Haoxiang Wang · Han Zhao · Bo Li -
2021 Spotlight: Practical and Private (Deep) Learning Without Sampling or Shuffling »
Peter Kairouz · Brendan McMahan · Shuang Song · Om Dipakbhai Thakkar · Abhradeep Guha Thakurta · Zheng Xu -
2021 Spotlight: CRFL: Certifiably Robust Federated Learning against Backdoor Attacks »
Chulin Xie · Minghao Chen · Pin-Yu Chen · Bo Li -
2021 Poster: Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks »
Nezihe Merve Gürel · Xiangyu Qi · Luka Rimanic · Ce Zhang · Bo Li -
2021 Spotlight: Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks »
Nezihe Merve Gürel · Xiangyu Qi · Luka Rimanic · Ce Zhang · Bo Li -
2021 Poster: Heterogeneity for the Win: One-Shot Federated Clustering »
Don Kurian Dennis · Tian Li · Virginia Smith -
2021 Poster: Ditto: Fair and Robust Federated Learning Through Personalization »
Tian Li · Shengyuan Hu · Ahmad Beirami · Virginia Smith -
2021 Spotlight: Ditto: Fair and Robust Federated Learning Through Personalization »
Tian Li · Shengyuan Hu · Ahmad Beirami · Virginia Smith -
2021 Spotlight: Heterogeneity for the Win: One-Shot Federated Clustering »
Don Kurian Dennis · Tian Li · Virginia Smith -
2021 Expo Talk Panel: Enterprise-Strength Federated Learning: New Algorithms, New Paradigms, and a Participant-Interactive Demonstration Session »
Laura Wynter · Nathalie Baracaldo · Chaitanya Kumar · Parijat Dube · Mikhail Yurochkin · Theodoros Salonidis · Shiqiang Wang -
2021 : Adaptive Federated Learning for Communication and Computation Efficiency (2021 IEEE Leonard Prize-winning work). »
Shiqiang Wang -
2020 : Closing remarks »
Nathalie Baracaldo · Olivia Choudhury · Gauri Joshi · Ramesh Raskar · Shiqiang Wang · Han Yu -
2020 : Opening remarks »
Nathalie Baracaldo · Olivia Choudhury · Gauri Joshi · Ramesh Raskar · Shiqiang Wang · Han Yu -
2020 Workshop: Federated Learning for User Privacy and Data Confidentiality »
Nathalie Baracaldo · Olivia Choudhury · Olivia Choudhury · Gauri Joshi · Ramesh Raskar · Gauri Joshi · Shiqiang Wang · Han Yu -
2020 Poster: Improving Robustness of Deep-Learning-Based Image Reconstruction »
Ankit Raj · Yoram Bresler · Bo Li -
2020 Poster: Context Aware Local Differential Privacy »
Jayadev Acharya · Kallista Bonawitz · Peter Kairouz · Daniel Ramage · Ziteng Sun -
2019 : Poster Session »
Ivana Balazevic · Minae Kwon · Benjamin Lengerich · Amir Asiaee · Alex Lambert · Wenyu Chen · Yiming Ding · Carlos Florensa · Joseph E Gaudio · Yesmina Jaafra · Boli Fang · Ruoxi Wang · Tian Li · SWAMINATHAN GURUMURTHY · Andy Yan · Kubra Cilingir · Vithursan (Vithu) Thangarasa · Alexander Li · Ryan Lowe