Paper ID: 278
Title: Learning privately from multiparty data

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper considers a quite interesting problem: how to learn an accurate and differentially private global classifier according to locally-trained classifiers from multiparty data. It proposes two algorithms by using unlabeled data, which, in my point of view, is the main contribution of this paper beyond previous work. The main idea can be stated as follows. First, train local classifiers on corresponding local datasets. Then, leverage them to label the unlabeled data, and finally use these unlabeled data to obtain the final private classifier. This method is quite flexible and robust, which means it could still work well no matter what types of those local classifiers are. The authors also generalize this method to some more general cases, like multi-class classifications. Besides, numerical experiments show their algorithms' good performance in practice. 

Clarity - Justification: The paper is well written.

Significance - Justification: The algorithms and privacy/utility analysis are nice, but somewhat similar to Chaudhuri et al (2011).

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This paper considers a quite interesting problem: how to learn an accurate and differentially private global classifier according to locally-trained classifiers from multiparty data. It proposes two algorithms by using unlabeled data, which, in my point of view, are the main contributions of this paper beyond previous work. The idea can be stated as follows. First, train local classifiers on corresponding local data sets. Then, leverage them to label the unlabeled data, and finally use these unlabeled data to obtain the final private classifier. This method is quite flexible and robust, which means it could still work well no matter what types of those local classifiers are. The authors also generalize this method to some more general cases, like multi-class classifications. Besides, numerical experiments show their algorithms' good performance in practice.   In all, this paper not only proposes their novel algorithms, but also provides a solid privacy and performance guarantee for their algorithms. Both the statement and proof of performance are similar to the paper of Chaudhuri (2011) except these differences: In Chaudhuri et al paper the performance is measured by the loss between predictions and true labels (y), while in this paper performance guarantee (i.e. Theorem 4) is given in terms of the weighted label (\alpha(x)), rather than true label y. So I wonder if it is possible to give performance analysis in terms of R(w) = E_{x, y}[l(h(x;w), y)] (where w is obtained either through algorithm 1 or 2).  Some minor comments: 1. As Q(j|x) (i.e. equality 15) is the probability over randomness of hypotheses, what is the relationship between Q(j|x) and P(y = j | x) ?  2. The experimental result of third dataset (i.e. fig 4 in paper) is strange when using vote algorithm. It would be better if the author could explain why does the performance become better when privacy becomes more strict (i.e. \eps becomes smaller). 

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): Suppose multiple entities own private samples from a distribution and build their own classifiers. Your goal is to find a global classifier of the data without violating privacy of the individual data parts. This paper observes that a standard approach based on building a global classifier and then perturbing it is too sensitive to individual classifiers, and proposes an alternative that uses "soft labels". One interesting idea here is that the paper makes use of auxiliary unlabeled data to build the global classifier which makes it easier to ensure the privacy of the individual sources.   The authors also provide generalization bounds for their approach, although the reference comparison is not exactly to a non-private global classifier. 

Clarity - Justification: It's a well written paper. 

Significance - Justification: It's a natural problem, and the solutions proposed draw on prior work in a nice way. It's not a substantially new contribution.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The authors assemble some standard ideas together in an interesting way. I hadn't thought about the use of auxiliary data to help with the global release before: while the idea is not novel (appearing in a workshop paper from 2013), it's still a nice trick to enable a private release of a classifier.   The method is crucially limited to global classifiers that are expressible in a "kernel-ish" way, because of how the weight  w is perturbed. What happens if you want to release a decision tree or a random forest  ? Is it sufficient to design some kind of differentially private mechanism for structured outputs and combine it in a cookie-cutter way with the local classifiers ?   I didn't get the sense that the experiments demonstrated clear differences between your proposed methods and the prior work (Pathak et al). As a separate minor nit, it was hard to remember which data series was yours: more informative names would be helpful. Also, it's a little unfair to compare your method (which is sort of semi-supervised) with only a pure supervised method: why not compare the effectiveness of the classifier with the Jagannathan et al approach (granted they were looking at a more limited setting).   Which makes me wonder: how important is the unlabeled data ? there's a price you have to pay for finding and using unlabeled data: could you maybe "bootstrap" this from the private data in a safe way ? 

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper provides a convenient framework for exchanging information about local classifiers to build a single global classifier. It allows for exchanging information about general classifiers (e.g. decision trees etc) This makes the paper extremely interesting. The contributions:   1. Notion of risk insensitive to individual votes  2. Generalization error converges at the rate of O(eps^-2 M^-2) and O(N^-1)

Clarity - Justification: The paper is extremely well written. There is sufficient background material to provide an introduction to the general reader. The only suggestion is that Section 5 can be moved up right after Section 2. 

Significance - Justification: Exchanging classifiers while maintaining privacy is an interesting problem with a wide variety of practical applications. The results in this paper allow for the exchange of general classifiers which makes it significant.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.):  1. Motivation: Averaging is a simple way to exchange classifiers. However it is not applicable to classifiers with non-numeric parameters, such as decision trees etc.  2. Procedure: (Fig 1) Local classifiers collected by trusted entity and labels public data. A final classifier is trained n this labeled data and distributed using output perturbation.  3. Conditions made on loss function : convexity, continuously differentiable with finite derivative. Only linear classifiers considered.   4. Why not release local classifiers after output perturbation?  It is an operation with a constant sensitivity as opposed to O(M^-1) sensitivity of average statistics. Hence labeling auxiliary unlabeled data is used as an alternate method to distribute the information about the ensemble.  5. Eq(14) describes the modified notion of risk that is insensitive to the change of label by a single vote. This is obtained by replacing the hard thresholding in Eq(13) by a soft threshold. This is used to obtain the labels for the auxiliary data.  6. The global classifier obtained by training on the labeled auxiliary data is shown to be \epsilon differentially private in Theorem 3. The weighted empirical risk is bounded in Theorem 4.  7. The experimental results seem to provide additional support for the theoretical results obtained.

=====