Paper ID: 146
Title: Variable Elimination in the Fourier Domain

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduces a new approach to compact representations of probability distributions by mapping them into the Fourier (frequency) domain. The key idea is an extension of the classical result of Linial et al., JACM 1993, from bounded CNFs to probabilistic graphical models. This relies on the key Hastad switching Lemma construction, a well known device in circuit complexity used to prove lower bounds on representation sizes. The application of the new representation to variable elimination is shown to produce beneficial improvements in convergence time with some standard experimental benchmarks. 

Clarity - Justification: This is a very well written paper, with a detailed and through analysis and explanation of all the key ideas. Figure 4 is hard to interpret in black and white, and it would help if the curves were distinguished not just by color, but also by dashed edges. 

Significance - Justification: This is a highly significant extension of past work on Fourier representations, from boolean representations like CNFs to probabilistic graphical models. The results on variable elimination are fairly convincing, but unfortunately the paper does not show the results on learning graphical models. It would be useful to see what the improvements are w.r.t. estimation of parameters in graphical models as well, if these are done in the Fourier domain.   The limitation imposed by Definition 2 needs to be explained in more detail. What sorts of functions does it rule out being able to represent? What is the consequence of that for representation of complex probability distributions. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): See above. 

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): Variable Elimination (VE) is a classic method for solving decision, optimization, and counting problems in graphs of small treewidth. Exact VE algorithms run in time exponential in treewidth, since the algorithm computes intermediate results that can be viewed as multidimensional tables, the maximum number of dimensions being the treewidth of the graph.   The present paper proposes an approximate algorithm for counting (integration) problems. It is based on the idea of representing the multidimensional tables in the Fourier space, throwing away terms with small Fourier coefficients. The approach is motivated by existing theoretical results telling that tables representable by low-depth decision trees or by bounded-width CNF (or DNF) formulas have a Fourier spectrum that concentrates on few coefficients. The paper also extends the latter connection to a class of weighted graphical models. Empirical results demonstrate that the proposed algorithms can rapidly find accurate approximations. 

Clarity - Justification: The presentation is somewhat sketchy. There are undefined terms or typos in several places. For instance, "V = {x_i | i \in V}" on line 163 doesn't make sense to me. And what is "S" on line 169? Listing all such flaws would amount to a long list, compiling of which is beyond this review.   There also positive aspects: rigorous statements as mathematical theorems; examples that help the reader; figures and tables are fine; the experiments are well reported and reproducible.  A negative aspect, hampering clarity, is the treatment of the Minibucket scheme. Namely, the paper does not clearly mention that the Minibucket scheme is able to equip the produced approximation with an accuracy guarantee. This is a very significant difference to the proposed method, as well as to belief propagation. In that sense, the comparison to the Minibucket scheme is not fair. 

Significance - Justification: The work is very interesting and could eventually be published at a top venue. However, in my opinion, the paper should undergo a major revision before publication.  On the positive side: + The proposed approximation algorithms seem novel. + The empirical results look very good.  On the negative side: - The Fourier approach is not new in itself; it has been used in exact computations; see Detailed comments. - The proposed methods are not able to bound the approximation error; so larger empirical studies are necessary -- however, see my response to the author rebuttal below.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Related exact algorithms exploting Fourier/Möbius transformation are given, e.g., in   1) Johan M. M. van Rooij, Hans L. Bodlaender, Peter Rossmanith: Dynamic Programming on Tree Decompositions Using Generalised Fast Subset Convolution. ESA 2009: 566-577  The following paper looks also very relevant and could be briefly discussed:   2) David Brodie Smith, Vibhav Gogate: The Inclusion-Exclusion Rule and its Application to the Junction Tree Algorithm. IJCAI 2013  Currently the paper does not cite these works.  

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes efficient variable elimination for distributions over binary variables in Fourier domain, for distributions with compact Fourier representation.   Linial's theorem links the width of a CNF formula to the concentration of its Fourier spectrum. The main contribution is to extend the theorem to functions f: {-1,1}^n -->R+ that are "contractive gap".  Existing results link the highest degree d of nonzero Fourier coefficients of a function f(x) to its representability by a decision tree of depth d. The authors link the existence of a decision tree "close" to a restriction of f(x) to the Fourier spectrum of f.  The results are applied to estimating the partition function of undirected graphical models, and seem to outperform other methods.

Clarity - Justification: The paper is well written for the most part. The high-level overview of the proof of Lemma 1 has a strange wording in a few places (e.g. line 423), and uses terms defined in supplementary material. The proof in the supplementary materials was also not easy to follow. I also suggest defining width more clearly in Defn 2.

Significance - Justification: The contribution is nontrivial, and the results seem useful. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I think this is an interesting contribution. I have one general comment. As the authors say, conditional independence cannot be exploited to simplify inference in models such as decision trees. However, context-specific independence can, and I imagine that decision trees would be easy for cutset conditioning. Since the results in the paper depend on the existence of a decision tree that is close to the original function, perhaps the problems in experimental evaluation would be easy for algorithms exploiting context-specific independencies, which are not evaluated.

=====