Paper ID: 299
Title: Exact Exponent in Optimal Rates for Crowdsourcing

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper studies the problem of inferring the correct labels for a set of examples by querying sometimes-unreliable workers in a crowdsourcing setup.  They characterize the optimal risk bound expressible in terms of the confusion matrices of the workers (probability worker i classifies as g a random sample having true class h).  While previous analyses in the literature have characterized the general asymptotic form of the optimal risk bound, the present work nails down the precise (asymptotic) constant factor appearing in the exponent (as the rates are exponential).  They establish the bound first for the (arguably unrealistic) setting in which the worker confusion matrices are known, but then complete the study by showing that the rates continue to hold when substituting a consistent estimator of the confusion matrices.  This analysis can then be applied to known methods for estimating the confusion matrix, such as the spectral method they discuss in section 4.1. 

Clarity - Justification: The paper is clearly written.

Significance - Justification: They resolve the exact constants in the exponent.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I found this to be a solid paper.  While the actual improvements may be incremental, (and the form of the bound is basically what one would expect), the fact that they provide a sharp answer to the question of optimal rates (albeit under some restrictions to the problem) makes them valuable, and this paper may become a standard reference for these bounds in the future literature.  To me, the most significant problem in the paper is that they seem to have (implicitly) assumed that the workers' predictions X_{ij} are independent across the workers i.  (e.g., this seems to be used in equation (14) in the proof).  First, this assumption needs to be stated somewhere (i.e., in section 2).  Second, it might warrant some discussion, since it might very often be violated in practice: specifically, in crowdsourcing, the workers aren't merely given the label y and asked to produce their random prediction, but rather are given an instance description x (which is not being modeled in this theoretical setting), and we should expect that the worker predictions would depend not only on the correct label y of x, but also on x itself.  So while it may be fairly realistic to assume the worker predictions X_{ij} are *conditionally* independent across i, given the corresponding instance x_j, it seems significantly less realistic to assume independence given merely the correct label y_j.  (one can circumvent this problem by embedding x_j into y_j itself, but then the conditions of the theorem would fail).  Nevertheless, the paper provides a clean answer to the simple setting they were able to study, so I recommend acceptance.  We can hope that the above independence assumption can be relaxed in future work. 

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper computes the asymptotic error decay exponent for the multi-expert, multi-label crowdsourcing problem where the confusion matrix is known. Spectral and EM approaches are proposed in the "adaptive" regime, where the confusion matrix is unknown.

Clarity - Justification: The paper is well-written, well-organized, and easy to follow.

Significance - Justification: Crowdsourcing is an important problem and computing minimax exponents is of non-trivial significance. The techniques appear to be standard; if there is any significant innovation in the proof methods, the authors should point it out.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I see at least two serious omissions in the related work survey. One is the paper by Parisi et al. [2], which gives an efficient spectral method that is robust to some malicious types of collusion. The other is [1], which gives a complete finite-sample analysis of the two-label case.  I would like to see both of these works discussed in relation to this paper's contributions.  [1] Daniel Berend, Aryeh Kontorovich A finite sample analysis of the Naive Bayes classifier. Journal of Machine Learning Research 16: 1519-1545, 2015  [2] Fabio Parisi, Francesco Strino, Boaz Nadler, and Yuval Kluger Ranking and combining multiple predictors without labeled data PNAS 2014 111 (4) 1253-1258; 2014, doi:10.1073/pnas.1219097111

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper establishes matching lower and upper bounds on classification error under the Dawid-Skene model.  The key quantity is the ``worker average Chernoff information''.  The upper bound proof involves a standard analysis of the maximum likelihood estimator and an analysis of an approximate MLE where the confusion matrices are estimated from the data.  The lower bound carefully analyzes the two-class misclassification rate.

Clarity - Justification: The paper is well-organized, with clear notation and math displays.

Significance - Justification: The matching lower and upper bounds for crowdsourcing seems to be new, and provides good insights to this topic.  The upper bound algorithm does not seem to be very practical, though.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The derivation of zero error using Markov's inequality from line 216 to line 228 seems to require $ m I (\pi) $ to grow to infinity.  If this is the case, please clarify by stating this requirement.  Otherwise, the application of Markov's inequality is not quite clear here.  In the lower bound proof, equation (15) and (16), it seems that the main result treats $k$ as fixed constant.  This should be made clear in the introduction and statement of main results.  Lin 625-626: there must be a typo in the middle part of inequality chain.  Also, this inequality holds only when $m$ is large enough.  This requirement either need to be stated in the theorem or this argument here needs to be made more precise.

=====