We thank the reviewers for their comments; they will definitely improve our paper. Below we address some major comments.$ Reviewer 1 We thank the reviewer for many helpful comments. 1. More explicit with constants in phases like ''with high prob'': This is certainly a valid point. Here, ''with high prob'' means ''with prob. exceeding 1 - C1*n exp(- 2m/n*(1- e^D) ) - C2*r^{-3}'' where C1 and C2 depend on specific graphs. We will be more explicit in the revision. 2. Cross-referencing errors / typos in the supplement: Thanks for pointing these out. We will definitely polish the supplement. 3. Effect of graph topology in sparser / more general graphs: With more refined argument we might be able to accommodate sparser graphs with r > log n. In the regime where r = O(1), our algorithm might not work as it cannot preclude bursty errors, but one can compute MLE in polynomial time via dynamic programming. Our paper is a first step towards more complicated graphs, and it would be very interesting to see how general we can extend our theory beyond these simple graphs. 4. Pre-specified ordering of nodes and unknown graph topology: Thanks for singling out lack of discussion. For grids, we can choose the core subgraph to be a subsquare of area r^2 lying at the center of the grid, and then spirals outwards progressively. For small-world graphs, it suffices to follow the same ordering as for the rings. In addition, the current approach relies heavily on knowledge of graph topology (or at least the knowledge of the connectivity radius in an orderwise tight manner). It would be interesting to analyze belief propagation for these graphs, which might not need the graph topology a priori. 5. Proof of Stage 2 in the supplement: indeed, the analysis of Stage 2 is a key step. We will elaborate on the key steps and improve the presentation in the revision. 6. Boundary case: while the current analysis focuses on the boundary case, we can extend it to more general situations. For spectral initialization, the regime we are working in is actually easier than the boundary regime in Chin et al (we can use Talagrand inequality to control the deviation of the sample matrix in a desired manner). We will include details in the revision. Reviewer 2 We appreciate the reviewer's high appraisal of our paper. The suggestions for haplotype phasing and proof sketches are both great ones. We will include motivation from haplotype phasing earlier on in the introduction rather than in the problem setup section, and briefly outline key steps of the proofs to make the paper more self-contained. In addition, we will list a few interesting future directions in the discussion section. Reviewer 4 We appreciate the reviewer's comments. 1. Technical novelty: we'd like to emphasize that our paper develops the first algorithm that achieves optimal information and computation limits for graphs with locality. The theory of prior methods (e.g. Abbe et al 2015 and Hajek et al 2015b), when applied to these graphs, delivers sample complexities that are at least O(n) times larger than the correct limits. Moreover, while Stages 1 and 3 of our algorithm share similarities with prior algorithms, the analyses therein cannot be readily applied here. Specifically, a) Stage 2 proceeds in a progressive manner, meaning that the error events associated with distinct nodes are highly dependent, and one needs to preclude cascading failures. This requires careful decoupling of the dependency via proper conditioning techniques. b) Approximate recovery alone is not sufficient to ensure perfect clean-up via Stage 3. We need to preclude bursty errors as well, i.e. we need that the errors occurring in Stage 2 and Stage 3 are spread out within the neighborhood of each node. Since we reuse all samples in all steps, we need to control in each iteration the spatial distributions of errors (and there are complicated dependencies across iterations). This concern does not arise in fully connected graphs as they are geometry-free. 2. Experiments on real data: In our revision, we plan to evaluate our algorithm on a high-throughput sequencing dataset from 10X Genomics, available for download from reference (10X, 2016). This is a dataset with 1.2 billion reads from the human genome sample NA12878, conforming with the model discussed in Section 4.2, with each measurement covering about L = 11 nodes. For this data, the error rate p ~ 0.01, and the number of reads available is about 7 times the information limit predicted by our theory. By sampling the reads, we can test how well our algorithm performs w.r.t. the limit on this real data. 3. Fully observed nodes: our approach does not really require the existence of a fully connected core subgraph. In fact, what we need is a subgraph that is sufficiently connected (i.e. each vertex is connected to a constant portion of other nodes in the subgraph). We will discuss this point in the revision.