We thank all three Reviewers for judging the "elegant/neat useful generalization of k-means++" (Rev. 1/Rev. 2) which “can have notable impact" (Rev. 1), with strong theoretical and experimental validation (Rev. 3).$
Along these lines follows our per-review response:
** Rev. 1 :
* reply to “Overall, the paper […] easier to follow” : To enable us to improve the clarity of our presentation, first, we will free up some space by: reducing the explanations after Theorem 12 to essentially eq. 22; moving the remainder of this content to the differential privacy part in the supplementary information (SI), along with row alpha, beta in Table 1 (we believe it would be sufficient to leave just one comparison with differential privacy in the main file). This will result in all Sections in the main file having a similar level of technical detail, and will make the SI useful for drilling down further into any of our applications of k-variates++.
This will also free up close to a full column, which we will use for two objectives:
1- to better explain in Section 2 the role of the probe function (Cf Rev. 2, 3)
2- to add a synthetic chart in Section 5 on the clustering potentials obtained by our method wrt contenders on our experimental domains (Cf Rev. 1).
* reply to “For the experimental […] datasets” : as written above, the space
we propose to gain shall be used in part to include this chart in the experimental results.
** Rev. 2 :
* reply to “The model for the distributed setting […] parallel computation ?” :
Indeed, we presented Dk-means++ in a privacy setting rather than a distributed
computation setting, because of the links with the privacy section (Section 4) and the convenience offered by the algorithm in this case. We propose to rephrase at least the beginning of the section “distributed clustering” in a more conventional synchronous
client-server architecture, and move some privacy-related specifics to the SI.
We also note (and will integrate in the paper) that (i) the presence of node N is not mandatory (any node can compute the distribution in a synchronous fashion) and (ii) the notion of having a “special” node in collaborative services is common in a number of distributed domains: in heterogeneous grid computing [1]; in hybrid, or server assisted peer to peer (P2P) networks [2], [3].
[1] Dong, Fangpeng, and Selim G. Akl. “Scheduling algorithms for grid computing: State of the art and open problems”. Technical Report No. 2006-504, Queens University, Canada, 2006.
[2] Yang, Beverly, and Hector Garcia-Molina. "Comparing hybrid peer-to-peer systems." Proceedings of the 27th Intl. Conf. on Very Large Data Bases. 2001.
[3] Lu, Jie, and Jamie Callan. "Content-based retrieval in hybrid peer-to-peer networks." Proceedings of the twelfth international conference on Information and knowledge management. ACM, 2003.
* reply to “The parameters […] in practice” : there is indeed a very big difference
between Theorem 6 and for example Theorem 9. Parameter $\xi$ can hardly be computed since it depends on the (unknown) optimal clustering, however (i) not knowing it does not prevent running of OLk-means++ with the guarantees given by the Theorem (which is not the case of Liberty et al’s algorithm 1), (ii) not knowing anything about
the optimum does not prevent running of OLk-means++ with the guarantees given by the Theorem (which is not the case of Liberty et al’s algorithm 2), and above all (iii) Theorem 12 explicitly shows that (and why) some streams are eventually easier to cluster than others (a stream dependent bound which we feel is important but essentially absent from Liberty’s paper), and why/when OLk-means++ will perform better: namely when optimal clusters are not too small and mini batches chop them in not too small sets either. This also brings some interesting strategies on how to devise minibatches for OLk-means++, a problem obviously out of the scope of our paper.
** Rev. 3 :
* comment on “A little more explanation of probe function […] useful”: as per our response to Rev. 1, we plan to use part of the freed-up space in the main file to explain the choice of probe functions. (we will also take care of the “Id” notation)
* reply to “The online […] Theorem 6 is interesting” : we also refer to our reply to Rev. 2 above.
We agree with the reviewer: “batching”/summary strategies are popular, beyond learning (as we also explain for streaming in the paper). We believe that Theorem 6 is interesting because it provides insight on strategies to optimise minibatching via a clearly identifiable knob, $\xi$.
On domains like on-line and streamed processing (i.e. beyond learning), where summarising strategies have proven their potential, we contribute to show that, and how, they are probably a key to efficient clustering as well.