Track: Privacy 1

Thu 22 July 6:00 - 6:20 PDT

Oral

Differentially Private Query Release Through Adaptive Projection

Sergul Aydore · William Brown · Michael Kearns · Krishnaram Kenthapadi · Luca Melis · Aaron Roth · Ankit Siva

We propose, implement, and evaluate a new algo-rithm for releasing answers to very large numbersof statistical queries likek-way marginals, sub-ject to differential privacy. Our algorithm makesadaptive use of a continuous relaxation of thePro-jection Mechanism, which answers queries on theprivate dataset using simple perturbation, and thenattempts to find the synthetic dataset that mostclosely matches the noisy answers. We use a con-tinuous relaxation of the synthetic dataset domainwhich makes the projection loss differentiable,and allows us to use efficient ML optimizationtechniques and tooling. Rather than answering allqueries up front, we make judicious use of ourprivacy budget by iteratively finding queries forwhich our (relaxed) synthetic data has high error,and then repeating the projection. Randomizedrounding allows us to obtain synthetic data in theoriginal schema. We perform experimental evalu-ations across a range of parameters and datasets,and find that our method outperforms existingalgorithms on large query classes.

Thu 22 July 6:20 - 6:25 PDT

Spotlight

Differentially Private Quantiles

Jennifer Gillenwater · Matthew Joseph · Alex Kulesza

Quantiles are often used for summarizing and understanding data. If that data is sensitive, it may be necessary to compute quantiles in a way that is differentially private, providing theoretical guarantees that the result does not reveal private information. However, when multiple quantiles are needed, existing differentially private algorithms fare poorly: they either compute quantiles individually, splitting the privacy budget, or summarize the entire distribution, wasting effort. In either case the result is reduced accuracy. In this work we propose an instance of the exponential mechanism that simultaneously estimates exactly $m$ quantiles from $n$ data points while guaranteeing differential privacy. The utility function is carefully structured to allow for an efficient implementation that returns estimates of all $m$ quantiles in time $O(mn\log(n) + m^2n)$ . Experiments show that our method significantly outperforms the current state of the art on both real and synthetic data while remaining efficient enough to be practical.

Thu 22 July 6:25 - 6:30 PDT

Spotlight

PAPRIKA: Private Online False Discovery Rate Control

Wanrong Zhang · Gautam Kamath · Rachel Cummings

In hypothesis testing, a \emph{false discovery} occurs when a hypothesis is incorrectly rejected due to noise in the sample. When adaptively testing multiple hypotheses, the probability of a false discovery increases as more tests are performed. Thus the problem of \emph{False Discovery Rate (FDR) control} is to find a procedure for testing multiple hypotheses that accounts for this effect in determining the set of hypotheses to reject. The goal is to minimize the number (or fraction) of false discoveries, while maintaining a high true positive rate (i.e., correct discoveries). In this work, we study False Discovery Rate (FDR) control in multiple hypothesis testing under the constraint of differential privacy for the sample. Unlike previous work in this direction, we focus on the \emph{online setting}, meaning that a decision about each hypothesis must be made immediately after the test is performed, rather than waiting for the output of all tests as in the offline setting. We provide new private algorithms based on state-of-the-art results in non-private online FDR control. Our algorithms have strong provable guarantees for privacy and statistical performance as measured by FDR and power. We also provide experimental results to demonstrate the efficacy of our algorithms in a variety of data environments.

Thu 22 July 6:30 - 6:35 PDT

Spotlight

Privacy-Preserving Video Classification with Convolutional Neural Networks

Sikha Pentyala · Rafael Dowsley · Martine De Cock

Many video classification applications require access to personal data, thereby posing an invasive security risk to the users' privacy. We propose a privacy-preserving implementation of single-frame method based video classification with convolutional neural networks that allows a party to infer a label from a video without necessitating the video owner to disclose their video to other entities in an unencrypted manner. Similarly, our approach removes the requirement of the classifier owner from revealing their model parameters to outside entities in plaintext. To this end, we combine existing Secure Multi-Party Computation (MPC) protocols for private image classification with our novel MPC protocols for oblivious single-frame selection and secure label aggregation across frames. The result is an end-to-end privacy-preserving video classification pipeline. We evaluate our proposed solution in an application for private human emotion recognition. Our results across a variety of security settings, spanning honest and dishonest majority configurations of the computing parties, and for both passive and active adversaries, demonstrate that videos can be classified with state-of-the-art accuracy, and without leaking sensitive user information.

Thu 22 July 6:35 - 6:40 PDT

Spotlight

Clustered Sampling: Low-Variance and Improved Representativity for Clients Selection in Federated Learning

Yann Fraboni · Richard Vidal · Laetitia Kameni · Marco Lorenzi

This work addresses the problem of optimizing communications between server and clients in federated learning (FL). Current sampling approaches in FL are either biased, or non optimal in terms of server-clients communications and training stability. To overcome this issue, we introduce clustered sampling for clients selection. We prove that clustered sampling leads to better clients representatitivity and to reduced variance of the clients stochastic aggregation weights in FL. Compatibly with our theory, we provide two different clustering approaches enabling clients aggregation based on 1) sample size, and 2) models similarity. Through a series of experiments in non-iid and unbalanced scenarios, we demonstrate that model aggregation through clustered sampling consistently leads to better training convergence and variability when compared to standard sampling approaches. Our approach does not require any additional operation on the clients side, and can be seamlessly integrated in standard FL implementations. Finally, clustered sampling is compatible with existing methods and technologies for privacy enhancement, and for communication reduction through model compression.

Thu 22 July 6:40 - 6:45 PDT

Spotlight

Differentially Private Correlation Clustering

Mark Bun · Marek Elias · Janardhan Kulkarni

Correlation clustering is a widely used technique in unsupervised machine learning. Motivated by applications where individual privacy is a concern, we initiate the study of differentially private correlation clustering. We propose an algorithm that achieves subquadratic additive error compared to the optimal cost. In contrast, straightforward adaptations of existing non-private algorithms all lead to a trivial quadratic error. Finally, we give a lower bound showing that any pure differentially private algorithm for correlation clustering requires additive error Ω(n).

Thu 22 July 6:45 - 6:50 PDT

Spotlight

Accuracy, Interpretability, and Differential Privacy via Explainable Boosting

Harsha Nori · Rich Caruana · Zhiqi Bu · Judy Hanwen Shen · Janardhan Kulkarni

We show that adding differential privacy to Explainable Boosting Machines (EBMs), a recent method for training interpretable ML models, yields state-of-the-art accuracy while protecting privacy. Our experiments on multiple classification and regression datasets show that DP-EBM models suffer surprisingly little accuracy loss even with strong differential privacy guarantees. In addition to high accuracy, two other benefits of applying DP to EBMs are: a) trained models provide exact global and local interpretability, which is often important in settings where differential privacy is needed; and b) the models can be edited after training without loss of privacy to correct errors which DP noise may have introduced.

Thu 22 July 6:50 - 6:55 PDT

Q&A

Main Navigation

Session

Privacy 1

Differentially Private Query Release Through Adaptive Projection

Differentially Private Quantiles

PAPRIKA: Private Online False Discovery Rate Control

Privacy-Preserving Video Classification with Convolutional Neural Networks

Clustered Sampling: Low-Variance and Improved Representativity for Clients Selection in Federated Learning

Differentially Private Correlation Clustering

Accuracy, Interpretability, and Differential Privacy via Explainable Boosting

Q&A