Paper ID: 437 Title: Fast k-means with accurate bounds Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): A new and faster version of Lloyd's (exact) K-means algorithm is presented. It has two new contributions: 1.) A new method of bounding the distance to the closest centroid in a sophisticated way. (Donat shaped regions around each centroid that improve the search for the second best centorid) 2.) An improved update of the bounds in each step leading to a tighter bound. (Using direct distance instead of sum of distances after a series of updates) While 1.) yields a new algorithm, 2.) can be plugged in other algorithms. In extensive experiments that show that 2.) improves slightly the run-time independent from the algorithm and 1.) is an improvement by a factor of 3. Clarity - Justification: The algorithm is described in a clear way. It is excelently set into context of other related algorithms. It is compared to other algorithms by experiments that seem to be reproducible. Significance - Justification: Even so run-time improvements of a factor of 3 can easily obtained by bying a number of cheap computers, I consider this work as a substantial improvement, because Lloyd's k-means algorithm is one of the most important algorithms in machine learning. The authors convinced me that this run-time improvement is achievable in practical applications. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I like to add one criticism: It is often said that k-means is an NP-hard problem. This is only true, if the dimension of the data is increasing faster than log(n). In the light that the authors mainly deal with low-dimensional problems, that should mention this adequatly. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper "Fast k-means with accurate bounds" proposes an exact k-means, which performs better than the current state-of-the-art low-dimensional algorithm. In the first parts of the paper, related approaches are reviewed. After an introduction and theoretical derivation of the novel approach, an experimental part shows that the novel approach performs up to three times faster in 18 of 22 experiments. Moreover, the paper proposes extensions of other k-means approaches for reducing the number of distances computations. Experiments are conducted with own implementations. Simplified variants turn out to be faster than the full counterparts. Clarity - Justification: The paper is clearly written. The structure is nice. The beginning spends a bit too much space to related work. But as they are employed in the experimental section and lead to the overall understanding, the ration of space is ok. The authors claim to have re-implemented all algorithms for comparison. This improves the reproducibility of the results as code and experiments come from one source. Significance - Justification: I would not say that a fast exact k-means variant would be a major breakthrough in clustering. It is more an incremental advance. However, for the clustering community, the approach offers a valuable result. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper introduces Elkan's algorithm, a simplified variant, Hamerly's, Annulus, simplified Yinyang, and Yinyang. The bounds derived show the speed wind. In the experiments, the authors impose a time limit of 40 minutes and 589 a memory limit of 4 GB on all runs. The results are convincing and replicable. The comprehensive baseline comparisons are an important part of the paper. The comparisons take into account single and multicore experiments, which is another advantage and shows the careful experimental ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes an acceleration technique for exact $k$ means. Here an exact algorithm means an instantiation of Lloyd's algorithm with no randomness/sampling injected into the assignment/update steps) for fair comparison. The previous acceleration techniques resemble nearest-neighbor based acceleration techniques. Each assignment/update step requires naively $O(kN)$ computation and these techniques can in most cases reduce this cost. This paper extends this direction and proposes the Exponion algorithm (closely related to the Annulus algorithm but faster). The insight for speedup comes from examining the outer test used in Hamerly's algorithm. The outer test enables skipping the inter-distance computation between a point and the set of all centroids. The annulus algorithm adds a origin-centered filter - the set of centroids that must be considered therefore can be narrowed down by keeping an ordering of the norms of the current centroids in a sorted order and applying binary searches. In contrast, the exponion algorithm shifts the filter from the origin to the current centroid assignment of the point in consideration. However, in order to apply the filter exactly, the overhead of sorting increases to $O(k^2 \log k)$ due to having to consider all pairs of centroids. The solution is to give up complete ordering for a partial ordering based on maintaining log_2 k concentric annului around each centroid. The second improvement comes from replacing the sum of norms of displacement to the norms of sum displacement. Clarity - Justification: The paper succinctly summarizes previous major approaches: standard Lloyd's algorithm, simplified Elkan's algorithm, Elkan's algorithm, Hamerly's algorithm, and Annulus algorithm. It would have been nicer if the authors could tabulate the space requirements for storing the running lower/upper bounds for each of these approaches. This could help readers understand why some of the results are marked with "m" (out of memory) in Section 4. The italics in the experiment section are not easy to distinguish from the others; maybe additionally boldening the text may help. In Section 4.1.2: "disactivate" -> "inactivated" or "not in use" Why are simplified versions of the algorithms more responsive to BLAS? Are there computational operations that can be vectorized more easily? Significance - Justification: Pretty comprehensive literature survey on state-of-the-art methods for fast k-means with experimental validation and new methods to go along with it. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Very well-written paper - see above for clarifying presentation on experimental section. =====