We thank reviewers for their positive comments. In the following, we answer questions asked by reviewers.$ 1. Reviewer 1: Q: To what extent are the results tied to the Gaussian assumption on the measurement vectors a_i. Does sub-Gaussianity suffices? What about other measurement procedures that are more suitable for actual implementation? A: We thank the reviewer for this insightful question. Generally speaking, Gaussian assumption on a_i makes technical analysis easy to handle. Under the sub-Gaussian assumption, some analysis steps can be more complicated and require further exploration. For example, the derivation of r in the proof of Proposition 2 needs further effort to go through and a result similar to Lemma 4 needs to be developed for specific sub-Gaussian distribution. We expect that our median-based algorithm continues to be robust under other measurement procedures. For example, we applied our algorithm to the coded diffraction model to recover images and it performs well numerically. The performance guarantee under these models will require further exploration. Q: Can we achieve a stronger error bound if one imposes an additional independence assumption on dense noise w? A: The current error bound is derived in terms of any given (deterministic) dense noise vector w. The order statistic inequality (Lemma 2) is the essential property that determines the error bound under dense noise w, and the infinity norm of w gets into the error bound via this inequality. Since such a property does not rely on the distribution of w, additional independence assumption on w does not improve the error bound (at least in our analysis framework). 2. Reviewer2: Q: line 95: m is on the order of n (instead of: m is on the order of O(n)). A: We will make the change. Q: Related work: gradient descent type algorithms following spectral initialization goes back to work by Keshavan et al for matrix completion. A: We will cite the reference properly. 3. Reviewer 3: Q: The motivation to study such a problem, especially in the setting specified in the paper. Due to initialization scheme etc, the result is restricted to Gaussian measurements which have very limited application in general. A: Algorithmically, our median-based algorithm does not depend on the Gaussian assumption of the measurement vectors, which involves only the evaluation of median of observed data. However, the assumption of Gaussian measurements simplifies the technical analysis, and provides useful insights into the essential properties that yield performance guarantee. We anticipate that generalization of such essential properties can be possible under other measurement models (such as sub-Gaussian models) as we answer reviewer 1’s first question. Q: Moreover, presence of several outliers for such models is not well motivated. A: Outliers are typically due to equipment failure and recording error, which can occur often in practice. On the other hand, our algorithm can resist such outliers in a blind way without knowing in advance whether there are outliers and how many outliers are present. Q: The removal of outlier measurements is similar to the hard thresholding based robust regression algorithm studied in Bhatia et al. NIPS'2015. Hence, it would be useful if authors can discuss the techniques of the two papers and if there is a more unified theoretical analysis. A: We thank the reviewer for suggesting to compare our algorithm with thresholding algorithm in (Bhatia et al. 2015). The two algorithms and the corresponding proof techniques are different. Our algorithm uses the median of observations as truncation threshold, while (Bhatia et al. 2015) truncates the top k items, where k is the number of outliers. Hence, (Bhatia et al. 2015) requires the information of the total number k of outliers in advance for truncation, while our algorithm does not need such information. Our analysis of performance guarantee requires to develop median-related properties, while (Bhatia et al. 2015) develops the statistical property of the remaining measurements after thresholding. Despite these differences, as the reviewer suggested, it is interesting to further explore whether a unified theoretical framework can be established to analyze robust algorithms for different scenarios.