Paper ID: 692 Title: Non-negative Matrix Factorization under Heavy Noise Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper describes a set of novel sufficient conditions that render NMF tractable, together with an algorithm. The new conditions extend the class of tractable NMF models, and the new algorithm seems to perform better than existing tractable algorithms for the same problem. Clarity - Justification: The paper seems to contain significant results, but it scores low in terms of readability. The material is way too technical (the paper builds heavily on Bansal et al., 2014), and often there is insufficient insight to support the theoretical claims/results. Significance - Justification: The papers seems to make a significant contribution theory wise, and the proposed algorithm seems to improve the state-of-the-art algorithms by a large margin. However, the material and exposition are so technical and hard to follow, that the contribution of the paper can only be tangentially appreciated. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): This looks like a good paper. I'm saying "looks like" because the results look great, but I had a hard time following the technical details. The paper draws heavily on Bansal et al (2014), with extensions that are pertinent to NMF. Unless one is familiar with the work of Bansal et al (2014), the paper is almost impossible to read. One is left to judge the quality of the work from the experimental results, which seem good. (Section 5.1 is also interesting.) In the synthetic experiment, the ground truth seems to be designed according to the assumptions required for the theorem to hold true. Is this the case? If yes, how restrictive is that, and what happens under more general assumptions? ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper studies the problem of nonnegative matrix factorization under heavy noise. The problem setting is that every column/row of the matrix may have a very large noise (potentially even much larger than the norm of the column), but an average of many rows cannot have large noise. The paper is able to solve such problem with very high noise under assumptions that are very similar to the topic modelling work of Bansal et al. Clarity - Justification: The paper is mostly clear. The reviewer especially liked that the paper justified the noise assumption carefully. Significance - Justification: Although at a high level the paper is very similar to the topic modelling paper of Bansal et al., the reviewer feels the proof is cleaner and there are some technical contributions. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The high level technique of this paper (as the authors also mention in the discussion paragraph on page 5) is very similar to the work of Bansal et al. for topic models. Many of the assumptions (mostly D1-D3) are paraphrased from the topic modelling setting. Assumption D4 (the noise assumption) is one of the main contributions of the paper as it gives a deterministic condition that is satisfied by many variants of probabilistic models, which includes the topic modelling but goes beyond that. The algorithm is also very similar to Bansal et al.'s algorithm. Technically the main difference is many things that were proved using random properties now has to rely on assumption D4. Also, the paper is able to get rid of the "no-local-min" property which was probably the least reasonable assumptions in that paper. The paper also proves the identifiability result explicitly, even if "dominance" is not assume, which is interesting to know. The paper also shows the algorithm indeed outperforms previous separable NMF algorithms by synthetic experiments. The main concern the reviewer has is that the paper should compare with previous algorithms more fairly. For example, from the abstract it sounds like the dominant NMF model has strictly weaker assumptions than separable NMF, which is not true (because the model also requires D2 and D3, and although they are reasonable in topic modelling, they may not be as natural in other applications of NMF such as hyperspectral unmixing). [After author response: I did not say you said something that is formally incorrect in abstract, but omitting facts can also lead to biased view. If someone who's not familiar with both of lines looks at the current abstract he/she is more than likely to assume this work is strictly stronger than separability. I think you should still at least mention the additional constraints whenever you try to compare the two.] Also, for topic modelling many of the NMF algorithms that do not handle such high noise should be applied to the word-word correlation matrix (as it is one in Arora et al.), which really reduces the noise to a tolerable level when there is enough documents. It would be interesting to see how the experiment results will change. [After response: I'm not claiming Arora et al. have a nice sample complexity in theory. However, the sample complexity analysis there is clearly fairly lose and hence I'm interested to see if it does anything reasonable in experiments.] Overall I think this is a reasonable paper. Its high level idea is indeed very similar previous work of Bansal et al. but I think there is enough new tools in a detailed level, also the application to NMF and formalizing the assumption D4 are nice contributions. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): Authors claim to prove that the algorithm TSVDNMF has several good properties, in particular in presence of heavy noise. The work is motivated by presence of heavy noise in various NMF applications, such as topic modeling. Their notion of heavy noise is stronger than the heavy noise models considered earlier. Their results establish stronger properties of the nonnegative factors compared to previous work. They illustrate their result using synthetic and real data. The TSVDNMF algorithm consists of (1) element-wise threshold the matrix [slightly more complex than hard-thresholding followed by element-wise square root] (2) compute the low-rank approximation (3) NMF Clarity - Justification: Globally hard to follow. The paper could have been better structured / more carefully written. Minor: Equations in the middle of a line of text should not contain \frac quotients. capital letters should not be used for words like MOST and EACH. They should be emphasized using italic or bold letters. This makes the reading hard: is that an abreviation or a word. Significance - Justification: The work is well-motivated, and deals with a serious practical problem. The algorithm is a bit hard to parse, and the statement of the algorithm highly relies on the model parameters. The theoretical results seem to prove strong guarantees in the specific case considered. I overall find the work interesting but the narrow focus both in the algorithm and results seem to be a weakness. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I think the algorithm is written in a way to make the proof go smoothly. All the constants are derived from the conditions on the noise model. The theorem should be stated in more general terms to be useable outside the scope of this work. In the empirical section authors compare their methods with other "provable" methods. They make an exception for k-means which is "not provable but popular". These statements are wrong. It's not because something hasn't yet been proven that it is not provable. Please remove these sentences. Different building blocks of the algorithm require to be better studied in the numerical experiments. Is element-wise square square root is better than x->log(1+x) for example or another concave increasing function? =====