We thank the reviewers for their insightful comments and respond to them below.$ Assigned_Reviewer_3: Q1: The paper also proves the identifiability result explicitly, although on the other hand having an algorithm that learns the correct parameter is already a proof of identifiability. R1: No. The algorithm only proves that there is a unique DOMINANT NMF. Our Theorem 2 proves that if there is a dominant NMF, it is the unique NMF (dominant or not -B’, C’ are not assumed to satisfy (D1)-(D3) ). This distinction is important since the existence of a second non-dominant NMF would defeat the point of uniqueness. Q2: The main concern of the reviewer has is that the paper should compare with previous algorithms more fairly. For example, from the abstract, it sounds like the dominant NMF model has strictly weaker assumptions than separable NMF, which is not true... R2: The abstract carefully says `` Our assumption on B is weaker than separability…’’. We do not claim the same thing of C (D2 and D3 are on C). Only claim ``empirical justification for our assumptions on C’’. Q3: For topic modelling many of the NMF algorithms that do not handle such high noise should be applied to the word-word correlation matrix (as it is one in Arora et al.), which really reduces the noise to a tolerable level when there is enough documents. It would be interesting to see how the experiment results will change. R3: Literally correct. But the \varepsilon in Theorem 2.2 of Arora et al. is PER-WORD error. So to get per topic error (as we have) of \varepsilon, one needs O(d^2) documents. [Indeed, for two words with frequencies O(1/d), need n\geq d^2, to get estimate of co-occurances.] Arguable whether O(d^2) documents are reasonable. Assigned_Reviewer_4: Q1: Equations in the middle of a line of text should not contain \frac quotients. capital letters should not be used for words like MOST and EACH. They should be emphasized using italic or bold letters. R1: We will correct these formatting issues in the final version. Q2: In the empirical section authors compare their methods with other "provable" methods. They make an exception for k-means which is "not provable but popular". These statements are wrong. It's not because something hasn't yet been proven that it is not provable. Please remove these sentences. R2: By saying a method non-provable we meant, the method has not been proven to be robust. However realizing its ambiguity, we will correct it in the final version. Assigned_Reviewer_5: Q1: In the synthetic experiment, the ground truth seems to be designed according to the assumptions required for the theorem to hold true. Is this the case? If yes, how restrictive is that, and what happens under more general assumptions? R1: In section 5.2, we discussed TWO sets of synthetic experiments to compare the robustness of the NMF algorithms. FIRST set of experiments follows the experimental settings of Gillis & Luce to generate the ground truth data which assumes separability on B, but no assumptions on C. So the validity of dominant assumption on C (assumed by the theorem) is not ensured by the ground truth data in this case. In this setting, we found TSVDNMF to be superior to the existing benchmarks particularly in presence of high noise. This shows the applicability of our method in conventional settings, not restricted to the dominant assumption. See paragraph beginning with line 679 and Table 2 for details. In the SECOND set of experiments to show effectiveness of TSVDNMF under dominant assumption, we ensure the ground truth data satisfies the dominant assumption on both B and C. The dominant assumption on B is much more general than the separability assumption and for dominant assumptions on C we provide empirical justification in section 5.1 .