We thank the reviewers for insightful and useful comments. Below we address some specific points made by the reviewers.$
< Reviewer 1 >
Comparison with Chaudhuri'11:

Our approach has been motivated by Chaudhuri et al., as we acknowledge in the paper.
However, Chaudhuri's approach is not directly applicable to the multiparty setting as the amount of noise required to ensure privacy  would be too large to make accurate classification possible. 
We view one of our main contributions as overcoming this issue by introducing a new weighted loss function.
The other main contribution of our paper is using unlabeled data to share the knowledge between classifiers.


Performance in terms of E_{x,y}[l(h(x;w),y)]:
We analyzed the performance using E_{x}[l^\alpha(h(x;w)), \alpha(x))] w.r.t. the given {\alpha(x)}.
When M\to\infty, the estimate \alpha(x) converges to the true conditional probability P(y=1|x), and the standard and the weighted risks are in fact the same, as shown in our Lemma 2 (p.5).
For a finite M, the two risks are different by a gap no larger than O(1/M).

Q(j|x) is P(y=j|x). We use the notation from Breiman'96 for `average classifier.'

Fig 4, increasing accuracy of the voting algorithm with higher privacy:
We could ascribe it to the fixed hyperparameter \lambda we used to simplify the experiment.
That is, \lambda=10^{-4} might be more optimal in high-privacy regions than low-privacy regions for the voting algorithm.  
However, the difference is rather small (not much more than one standard deviation) and it is possibly a statistical aberration.


< Reviewer 3 >
Comparison with Jagannathan'13:
The workshop paper uses unlabeled data to partition the feature space before building random forest, while we use unlabeled data to build auxiliary labeled data using an ensemble of any classifier type. 
Their paper focuses on the privacy of collected samples leaked by the released random forest, while our paper focuses on the privacy of locally-trained classifier (and consequently the local samples as well) leaked by the global ERM classifier. We will clarify in the revision.

Using decision tree or random forests:
As the reviewer pointed out, the auxiliary labeled data from our algorithm may be used to train other types of global classifiers including decision trees and random forests. 
However, unlike private ERM, the performance of private decision trees or random forests is not known precisely. 

Comparison with supervised method:
In the experiments, our unlabeled dataset is  6--10 times _smaller_ than the labeled dataset (1K vs. 6K for activity, 43K vs. 450K for KDD, 16K vs. 184K for URL). 
It is unlikely that such a small amount of unlabeled data would make a noticeable difference in classification accuracy.  We will add a note in the revision.

"Bootstrapping" private data:
It's an interesting question.  In the typical use scenario we see, using private data as auxiliary data seems questionable. Perhaps there could be situations where such use is justified. 
Regarding the privacy of auxiliary data, we present differentially-private extensions in Section 4.5.