Paper ID: 50 Title: Data-driven Rank Breaking for Efficient Rank Aggregation Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): In this paper, the authors study the technique of rank breaking in rank aggregation, whereby a (partial) ranking is split into a set of pairwise comparisons, for which a maximum likelihood estimate of the underlying probability distribution on rankings (the Plackett-Luce model is assumed by the authors) can be determined more efficiently. In general, however, estimates thus obtained are biased. To produce unbiased estimates, the idea is to treat the paired comparisons unequally, depending on the topology of the collected data. The authors provide an optimal rank-breaking estimator and show that it not only achieves consistency but also the best error bound. Clarity - Justification: see detailed comments Significance - Justification: see detailed comments Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): In this paper, the authors study the technique of rank breaking in rank aggregation, whereby a (partial) ranking is split into a set of pairwise comparisons, for which a maximum likelihood estimate of the underlying probability distribution on rankings (the Plackett-Luce model is assumed by the authors) can be determined more efficiently. In general, however, estimates thus obtained are biased. To produce unbiased estimates, the idea is to treat the paired comparisons unequally, depending on the topology of the collected data. The authors provide an optimal rank-breaking estimator and show that it not only achieves consistency but also the best error bound. The topic addressed in the paper is very interesting, and the results appear to be sound (although I didn't carefully check the extremely long appendix). ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes a new rank breaking algorithm which provides optimal bounds on estimation error. This is done by connecting accuracy of estimation in rank breaking to the spectral gap in the rank breaking graph. Clarity - Justification: The paper is well written and easy to follow. It also covers methods, theory and empirical results in a solid and connected manner. Significance - Justification: This paper extends rank breaking algorithms. Former rank breaking methods relied on generic breaking for all data, this paper proposes an adaptive breaking strategy that depends on data. Driving error bounds are a new addition to the literature in rank data. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This paper has a solid proposal and extends modeling of rank data using rank breaking. The focus on topology of the data is key to the improvement and the results on both theoretical bounds and empirical aspects support this. I have a few suggestions/questions: Line 230: Author mentions logit regression. Since this paper is only modeling the parameters and the model doesn't include features, it might be better to drop regression to avoid confusion. "a" and "b" are both used as notation for alternative as well as some variables (line 522). I suggest changing them to avoid confusion. The empirical comparisons needs standard deviation for them to make different methods comparable, for example Figure 4 has close graphs and it is important to discuss statistical significance of the differences. I would like to see a computational complexity discussion on what is the price we pay in terms of computation by adopting this new method. Some theoretical arguments or at least empirical comparisons are essential to understand if we are not making rank breaking more complex than straight MLE optimization. Azari et al. 2014 proposes a general moment function where likelihood function is a specific case, I am wondering if the framework and results are expandable beyond likelihood function to the generalized methods of moments case? ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper considers ranking data generated by Plackett Luce model, where there are d items with each item having a parameter theta_i for 1\leq i\leq d which decides how popular the item is. There are n samples to reconstruct this parameter vector theta, with each sample being a partially observed ranking. It gives an algorithm to achieve this task which is computationally much more feasible that computing the MLE, and gives finite sample bounds for the error of this algorithm. Essentially the algorithm can be viewed as optimizing weighted approximate likelihood. One important point is that the paper also gives a data drive choice for the weights in the algorithm. It also computes the minimax error rate when each partially observed ranking is of a simpler form. Clarity - Justification: The paper explains all the concepts clearly, and is easy too follow. The high level proof ideas are also well explained. I did not go through the proofs in the supplementary file. Significance - Justification: As explained in the paper, considering a weighted version of approximate likelihood (via separators and corresponding rank breaking graphs) considered in this paper are not new and have been introduced elsewhere. What is also known from previous work is that there are some choices of weights which consistently recovers the parameter theta. The main contribution of this paper seems to be that there is an optimal choice of weights, and this choice has been made explicit, which help practitioners implement the algorithm. The paper complements this by synthetic experiments which demonstrates that there is a significant gain by using the proposed weight scheme. However, I do not follow intuitively why these choice of weights are optimal (see detailed comments below). Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is well written and considers an interesting problem. One concern that I have is about the definition of parameter gamma in equation (13). It seems weird that the position of the separators affect the algorithm so much. Since p_{j,l_j} is the position of the last separator, somehow the algorithm prefers all the separators to be in the early part of the sample. I do not understand this intuitively. Is this a by product of the proof technique, or is there a reason for this? A similar comment applies to the proposed choice of weights. Why should separators which are positioned towards the end have more weights? A minor comment is P(sigma_i) in line 200 on page 2 should be P(sigma_j). =====