Paper ID: 1342 Title: Stochastic Discrete Clenshaw-Curtis Quadrature Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a new approach to approximate the partition function (normalization constant) of intractable probabilistic models. The key idea is to use a polynomial approximation (based on chebychev polynomials) to the unnormalized probability measure, assumed to be in the exponential family. Under some mild assumptions on the sufficient statistics (satisfied for example by the canonical overcomplete representation of a MRF), it's possible to evaluate the integral of the approximation in closed form. An error analysis as a function of the degree k of the polynomial is provided. Clarity - Justification: The writeup should be improved. There are undefined symbols, and some of the technical results appear correct but should be formalized with more care (see below for detailed comments). Significance - Justification: The idea is very innovative. I haven't seen anything along these lines. I have some reservations about how practical this approach is (see below), but the idea seems worthy and novel. There are only a handful or known approaches to do approximate inference with intractable distributions, and this paper provides an interesting alternative. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The runtime is claimed to scale as n ^ {2k}. the experiments reported have n = 128 * 128 > 10000, k = 16. It seems that n ^ 2k > (10000) ^ 32 should be way too big. hidden constants in the bigO notation? In the introduction and table, the relationship between k and epsilon should be made clear. the way it's written, it seems that epsilon and k are independent, which doesn't make sense. typically, epsilon should be the input parameter, and one pciks a k that gives the desired accuracy. similarly when you talk about the stochastic version. epsilon and the probability of failure should affect the reported runtime additional feedback on the writeup: - it would be helpful to have a theorem where you clearly state all the assumptions needed and your result. "assuming it's an exponential family, the sufficient statistics are integrable and ..., k > ... you get error epsilon". right now the assumptions are scattered throughout the paper, which makes it harder to follow. it would also help to clarify how the result fits in with known hardness results for inference - line 148. symbols are undefined. it seems that one should just choose m = 0 to get the best runtime - definition 1, missing quantification over j. what is the domain of j? does it need to hold for all j? - line 466, it would be helpful to clarify how to go from 464 to 466 (also refer to definition 1 for the notation) - 468 I didn't get how you obtain the asymptotic complexity - i'd like to see more details on how the coefficients are computed (eq. 9, line 4 of pseudocode) -307, what's the role of the weight function w ( x ) ? - 437, remind reader about what C(G) is - 653, what are a and b? ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper presents an approximation algorithm for the partition function which is based on Chebyshev polynomials. The authors have concrete bound on the quality of the approximation and a concrete tradeoff between running time and approximation quality. Clarity - Justification: The paper is well explained and although technically involved, it is not too hard to follow. Significance - Justification: The technique of approximating integrals and the task of approximating the partition function in a provably meaningful ways are both important and as far as I can tell novel. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is very interesting and treats an important ML problem. Both the technique and the results should be presented to the ICML audience. I would like to see some elaboration about why their approach do not violate computational complexity proofs. Is is since the "expected" norms of weight vectors which are NP hard are usually much bigger than 1? There are works that show that even additive approximation of the partition function is a hard task (see for instance http://arxiv.org/abs/quant-ph/0702008) ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The authors describe an approximation scheme for computing the partition function of exponential family graphical models based on quadrature techniques. In addition, the authors provide two algorithms that come with both complexity and approximation guarantees. The quality guarantees degrade with the norm of the parameter vector. Experimentally, it does not appear that this approach significantly outperforms the naive mean field lower bound (even with moderately size potentials). Clarity - Justification: The paper, while relatively easy to read, contains a number of typos, missing bits of information, and oversimplifications. For example, in definition 1, "j" is never defined or explained. In Figure 1, axis labels should be added to clarify the plots. While it is true that discrete state space models can all be expressed as in (3) and (4), this isn't required. Significance - Justification: In terms of significance, I am not aware of extensive applications of quadrature to the problem of computing the partition function, but I'm also not sure how useful the described algorithms actually are. In particular, the complexity (at least in theory) scales poorly versus the norm of the parameter vector. While the authors do attempt to suggest that small values of "k" do yield reasonable approximations compared with existing methods, it seems like even k = 3 could be problematic in practice without a highly parallelized approach. The fact that quadrature based methods are able to be parallelized to such an extent is interesting, but it seems that all of that work is required just to barely compete with current methods - not to outperform them. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Overall, I like the approach, but I feel that the introduction makes it sound more useful/powerful than the experiments appear to demonstrate. However, there may exist applications in which this technique is ideally suited. If so, the authors should explain what these applications are and compare against the baseline approaches in this setting. =====