Poster
in
Workshop: Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities
Machine Learning with Feature Differential Privacy
Saeed Mahloujifar · Chuan Guo · G. Edward Suh · Kamalika Chaudhuri
Machine learning applications incorporating differential privacy frequently face significant utility degradation. One prevalent solution involves enhancing utility through the use of publicly accessible information. Public data-points, well-known for their utility-enhancing capabilities in private training, have received considerable attention. However, it is worth noting that these public sources can vary substantially in their nature.In this work, we explore the feasibility of leveraging public features from the private dataset. For instance, envision a tabular dataset in which some features are publicly accessible while others are kept private. We delve into this scenario, defining a concept we refer to as \textit{feature-DP}. We examine feature DP in the context of private optimization, and propose a solution based the widely used DP-SGD framework. Notably, our framework maintains the advantage of privacy amplification through sub-sampling, even while some features are disclosed.We analyze our algorithm for Lipschitz and convex loss functions and we establish privacy and excess empirical risk bounds. Importantly, due to our strategy's ability to harness privacy amplification via sub-sampling, our excess risk bounds converge to zero as the number of data points increases. This enables us to improve upon previously understood excess risk bounds for label differential privacy, and provides a response to an open question proposed by (Ghazi et al. 21).We applied our methodology to the Purchase100 dataset, finding that the public features facilitated by our framework can indeed improve the balance between utility and privacy.