Paper ID: 384 Title: PAC learning of Probabilistic Automaton based on the Method of Moments Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper studies the problem of proper learning for a special class of PFAs. The main result of the paper is an algorithm that properly learns a special class of PFAs called PRFAs. Clarity - Justification: The reviewer is familiar with the field so it is fairly easy to follow the arguments. However spectral learning for PFAs and related models have always been a bit heavy in notations and definitions, so it might not be very easy to read for people who haven't read previous works. Significance - Justification: The paper is based on two ideas - spectral learning of PFAs (which were not proper) and separable NMF. Both were well-studied but it is a new perspective to combine them. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): As mentioned in the paper, previously there are some results on spectral learning for even richer models, but those results only learn to make predictions and do not actually give a model within a reasonable class. For special cases like Hidden Markov Models, there are results that can learn a model, starting from the paper "Learning nonsingular phylogenies and hidden Markov models" by Mossel and Roch (this citation is missing). This paper does not learn the general PFAs, instead, it introduces a restricted class called the probabilistic residual finite automata (PRFA), and is able to design an algorithm to properly learn PRFA. The additional property that PRFA requires is very similar to the separability assumption in terms of nonnegative matrix factorization/anchor words assumption in topic models (which should be made clear earlier in the paper, when the definition is introduced). Roughly speaking it says for each state there is a prefix that determines the state at the end of the prefix must be (or is effectively equivalent to) that state. The algorithm then is a combination of the traditional spectral algorithms for learning the PFA and the algorithms for (near)separable NMF. The paper shows the algorithm is consistent and gives a polynomial sample complexity/running time bound. The paper then did some experiments showing that the algorithm works reasonably in practice. Overall this is an interesting extension of separable NMF algorithms to the more complicated case of PRFA. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper presents a PAC algorithm for learning probabilistic residually-finite automata. This class of automata has been studied in the literature and it is well-known that the residually-finite assumption can lead to simplified learning problems. By exploiting this assumption, the paper shows how a method of moments for learning PRFA can be obtained by solving a near-separable non-negative matrix factorization problem. Combining known results about the computational complexity of separable NMF problems and their robustness to noise, the paper uses somewhat standard techniques to give a polynomial time algorithm with polynomial sample complexity for the problem of learning PRFA. The main novelty of the work is that the proposed algorithm avoids the problem with “negative probabilities” typical of previous spectral methods, and also avoids the restrictions imposed by other approaches based on symmetric tensor decompositions. Clarity - Justification: In general, the paper is well written and the experimental evaluation is convincing though some aspects could be explored in more detail (see comments below). Significance - Justification: The techniques used in the paper are known, but the way in which they are combined yields a new interesting result. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): [L273] The reduction of LPWN to PFA learning was already given in Kearns et al. STOC’94. In addition, the LPWN problem is supposed (but not known) to be difficult. [L303] The claim about existence of strings reaching each state is only valid if the PRFA is minimal. [L490] Is the algorithm also improper in the limit? It seems that if the residuals are known exactly the recovered automaton will be probabilistic without any constraints and therefore it will be a PRFA by construction. On the other hand, the spectral algorithm is improper even in the limit. Perhaps this is a distinction worth making. [Theorem 4] I think the sample complexity should depend on t^4 instead of t^2. [Section 5.2] Some the of the problems in the Pautomac dataset correspond to deterministic PFA which are a particular case of PRFA. In those cases one would expect the algorithm based on separable NMF to do better than NNSpectral. Is that the case? [L760] What is the BPC similarity used in the experiments? How are negative values assigned by the spectral algorithm treated in this metric? It possible that the poor performance of Spectral in this case is due to the treatment given to negative values? ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper addresses the problem of learning a distribution with latent variables over sequences of symbols from independent samples drawn from it. It uses method of moments to learn PFRA (a subset of FRA). provides sample complexity analysis and empirically compares with a number of recent Method of moment works and EM. The experiments are They provide PAC style guarantees for their method. Clarity - Justification: The paper is nicely written in terms of English though it has a heavily technical theme that makes it hard to follow if some of the descriptions are new for you. There is a couple of typos in the tables that have "Tenseur" instead of "tensor" which is weird! Significance - Justification: This is interesting new result and the considered problem is not an easy one. The interesting question to ask is that how to extend this method to learn a wider class. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper considers an interesting problem, though provides a method for learning a sub-class for PFA. The authors provide PAC style guarantees about their algorithm and empirically show improvements over earlier MoM based methods and EM which is interesting. The paper would surely benefit from more intuition about the method and less-technical explanation of facts. =====