Paper ID: 3 Title: Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper considers the problem of estimating the preference matrix P w.r.t. squared Frobenius norm. The main focus is the class of P that satisfies strong stochastic transitivity (C_SST). The first significant result is that the popular parametric class of P that is defined through an n-dim vector, i.e., 1-d representation of each item, (call it C_par) does not work uniformly well in the SST class — for every n, there is a P^* for which any valid estimator in C_PAR fails to do better than random guess. Then, the authors establish a lower bound and an upper bound on the recovery for C_SST. Since the upper bound is hard to achieve in polynomial time, the authors propose a polynomial time algorithm that has O(1/\sqrt(n)) convergence rate. The authors then argue that there exists a subclass of C_SST that is named C_HIGH (high snr) for which there is an algorithm that has rate O(1/n) rather than O(1/sqrt(n)), which matches the lowerbound in C_SST upto logarithmic factor. Next, it is shown that O(1/n) is optimal within a more restrictive C_PAR, so the authors claim that their algorithm for C_HIGH is superior than parametric models (same performance with less assumption). Empirical evaluation verifies well the theoretical claims. Clarity - Justification: The paper is written clearly overall. However, I think the comparison between the results in C_HIGH and C_PAR is a bit confusing since C_HIGH is not a superset of C_PAR, if I am not misunderstanding. Can one achieve a better bound for C_PAR if it is restricted to high snr as in C_HIGH? Significance - Justification: I think the results are all solid within the scope defined by the authors. It would be great if one can consider observing one entry more than once (say $m$ times). Then, how does the error behaves with increasing $m$? I don’t know if it is an trivial extension. I’d like to know the authors’ opinion. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): 613: is show that => is to show that 678: the the => the ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper studies the estimation of pairwise comparison probabilities on an item set when the matrix of these probabilities satisfy a property called ``strong stochastic transitivity’’ (SST). This SST class includes well known parametric models such as Bradley-Terry-Luce and the Thurstone models and the authors discuss that in general it can capture pairwise comparison models that are very far from any parametric model in a certain sense. The main contributions include (1) bounding the minimax estimation rates for matrices in this SST class, (2) analyzing an polytime SVD based estimation algorithm that doesn’t achieve the minimax rate, and then proposing an algorithm that *does* achieve the minimax rate when the pairwise comparison probabilities are bounded away from 1/2. Clarity - Justification: Overall I thought that this paper was very well written and that the authors have made the main results quite accessible even though the proof techniques seem sophisticated — I hope this paper gets accepted. Significance - Justification: It's somewhat unclear if the results have practical implications for machine learning, but studying SST does indeed significantly (and elegantly) generalize our knowledge of estimation of pairwise probabilities Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Other thoughts: + I wonder if the authors have considered the case where partial observations follow somewhat more realistic models. For example, often several objects are far more frequently observed than others (one might imagine a power law distribution over item frequency). What could we say there? + Though SST does seem quite a general notion, is there a notion of WST (weak ST)? One limitation seems to be that transitivity is always consistent with a single global permutation. But this somehow seems wrong if we’re talking about preferences of a population of people… is there a notion of stochastic transitivity where the transitivity must hold with respect to a random permutation draw from some distribution? ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper considers recovering the true n by n matrix M* given one observation. The estimation is up to some matrix constraints on M*. In detail, entries in the observation are independent Bernoulli distributed, and the target is its expectation. The authors derive minimax lower bounds on this problem, and prove upper bounds on LSE and a computationally efficient alternative. Clarity - Justification: The paper is very easy to follow, and the results are presented beautifully. Significance - Justification: The work is very solid, and the improvement over existing ones is visible. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): There are several things I would love to mention: (1) The authors need to spend more time justifying the SST model. In particular, it will be desirable to know why SST is acceptable in the psychology literature. (2) If Y's entries are indicators of pairwise comparisons, it is very weird to assume they are mutually independent. (3) If I understand correctly, the singular value thresholding estimator in Section 3.2 is not guaranteed to be SST (I thought it would be SST when the sample size is large and SNR is high.). If so, it is a concern and should be addressed. (4) The numerical results are very unsatisfactory. It shows that the empirical improvement over the existing ones is very minor. =====