We very much appreciate valuable comments, efforts and time spent by the reviewers to evaluate the paper. We first provide responses to the major comments of the three reviewers and then detailed responses to each reviewer.$ Q1: Discussion on our contribution and assumption R1: Our main contribution is that this work is the first to show BP optimality via introducing the new lower bound. Although BP is not a new algorithm and has been successful in many different areas, its theoretical understanding is limited and its performance is not easy to analyze in general. We prove its optimality when each worker is assigned to at most 2 tasks (r = 2) and each task is assigned to sufficient number of workers (\ell=\Omega(1)). Despite the challenge with r >= 3 which Reviewer 3 also agreed, we provide the relative dominance of BP comparing to MV and KOS for all r and \ell. In addition, our experiment shows that the performance of BP almost matches with the lower bound in all r and \ell, as Reviewer 3 mentioned that the promising performance of BP on real datasets are significant. We think that our work is important in particular because it eliminates the possibility of designing other types of algorithms, while a variety of algorithms have been proposed in the literature to attack this problem. Our works also motivates to show optimality of BP in general crowdsourced classification model with r>=3, which we believe to be an important open question in this area. Q2. Analysis with finite n and Experiment with moderate n R2: We use infinite large number of tasks, n, just for a simple presentation of our result. We can also write down Theorem 1 and Theorem 2 with finite n instead of infinite n, e.g., the gap between error rates of optimal algorithm and BP in Theorem 1 is bounded by 2^{-loglog n} + 2(log n)^(2 loglog n +1)/n (which goes 0 as n goes infinity). We will add this in the final draft. In our experiments, we actually tested both moderate and large n cases. But, the reason why we only report experimental results of moderate n is because it is actually more challenging regime for BP. For completeness, we will also include the experimental results with large n in the final draft, which also shows the optimality of BP. For Reviewer 1 - We use Lemma 3 in 590-614. We need the conditions on \ell and r to show the decaying correlation between the root and the information on leaves (Lemma 3) which implies the diminishing gap (the first term in (15)) between oracle estimator and BP estimator. - Refer R2 for analysis with finite n. - The main difference between BP and EBP is in the knowledge of the fixed worker reliability distribution (see 229-251). BP assumes this knowledge, but EBP estimates it from the data and runs BP with it, iteratively. - In our experiments, we choose k=100. The choice of k is not that sensitive since we observed that BP typically converges quickly (i.e., within 20 iterations when n = 200). This means that the experimental results include loops in the BP, and BP is close to optimal in experiments (for all regimes). We will incorporate this comments in the final draft. - We need the MAP estimator maximizing the marginal probability p(s_i|A) since we focus on minimizing the average ‘bit-wise’ error rate. - We actually provided the reference of EM in 748. For Reviewer 2 - Refer R1 & R2 for discussion on our contribution and experiment. - We want to correct Reviewer 2’s comment on our assumption for BP optimality. We show BP optimality when each worker is assigned to at most 2 tasks (not each task is assigned to at most 2 workers). Each task can be assigned arbitrary number of workers. - We use "exactly optimal" to distinguish our concept of optimality from the near-optimality in (Karger et al. ’11), where there is a gap of constant factor. - Reviewer 2’s understanding on the step from (14) to (15) is right. One could also use 1/2 (instead of 1), which gives a gain in the constant factor. For Reviewer 3 - We greatly appreciate Reviewer 3's editorial comments on presenting our result. We will tone down and incorporate all comments in the final draft. - We agree that "without loss of generality" is misleading. We consider arbitrarily given but latent labels, which is identical to assuming each label is +1 or -1 with equal probability. One can definitely improve performance of BP if we know distribution of labels in advance. We will clarify this in the final draft. - We thank Reviewer 3 for correcting our citation on censored block model. We will correct it in the final draft. - We meant our model with r >= 3 might correspond to the stochastic block model (SBM) with hyper-edge as we mentioned in 158-166, where SBM with multiple groups might map to our model with multi-alphabet label. However, Reviewer 3’s idea can be used for extending our work to the multi-alphabet label scenario. - Refer R2 for discussion on our experimental results.