Reviewer1: Thank you for the careful evaluation of our paper and very constructive feedback.$
The fact that Minibucket is able to provide upper/lower bounds on the partition function is an excellent point and we will note this in our revised version. The bounds indeed provide an additional guarantee on the quality of the MBE results. In subsequent work, we plan to add similar bounding guarantees for Fourier representation. The MBE implementation we used in the paper gives upper bounds. We will revise the paper to elaborate on this aspect of MBE. 

Overall, we do not intend to claim that the Fourier approach should replace a Minibucket strategy. We view these different approaches as complementary, each with different strengths. In fact, we are actively working on combining the two approaches. We will include a discussion of this issue in the paper, with further pointers to the MBE literature and work on bounds. 

Because of the worst-case hardness results for relative error bounds for #P problems [Dechter 2011], we compare the different methods using an empirical evaluation. Our comparison includes large instances from the UAI inference challenge. In Table 2, grids2/90* instances on average have 855 variables and a treewidth of 171; for mastermind_06*: 1814 vars and 138 width; and mastermind_10*: 2606 vars and 1772 width. The treewidth values are approximate, computed using the state-of-the-art solver quickbb [Gogate, Dechter 2004]. Both Minibucket and Fourier can solve exactly instances with a treewidth of roughly up to 20. Because our test instances have much larger treewidths, they provide a good challenge for the methods.

Presentation suggestions: Thanks! We will carefully examine all paper details. It should be V = {x_i | i=1,.., N} in line 163, and S = {x_{i_1}, …, x_{i_k}} in line 169.

Also thanks for the additional pointers to relevant work.  Those will be included in revised version. The three listed papers are complementary to ours. We propose a novel compact representation based on discrete Fourier representations, complementing the classical factorized representations. Our main contribution is to show both theoretically and experimentally that Fourier representation with limited number of terms can provide a good approximation for a large class of functions, useful in approximate inference. We provide theoretical guarantee on the quality of Fourier approximation with bounded degree in Theorem 5. Notice that the weight concentration result includes a wide class of functions, such as 3-CNFs and 3-DNFs.

The novelty of our work therefore lies in the use of Fourier representations to approximate functions in probabilistic inference. Papers 1) 2) from review 1 provide fast algorithms to compute convolutions, speeding up message computation in exact inference. We will study the technique for possible use in our approximation approach. The change of basis in paper 3) is used for exact inference to speed up complex marginal queries. Again, there may be a use in approximate inference too. We will discuss these papers in the revised paper. 

Reviewer2: Thank you for your constructive feedback. Indeed, the example in Figure 1 does not rule out context-specific independence, such as cutset conditioning. However, context-specific independence is a broad term. There are many nuances. For example, despite the fact that Sentential Decision Diagram and Ordered Binary Decision Diagram both exploit context-specific independence, the former could be exponentially smaller for a particular function. For a function with compact Fourier representation, Theorem 3 states that it can be represented by the *sum* of shallow decision trees. This is different from a single decision tree, so cutset conditioning may not apply. We will consider this further. We agree that it is important to compare with a solver exploiting context specific independence. Ace (which exploits it) runs much slower than our algorithm. However, the comparison is not fair, since Ace is exact. We are looking for an approximate counter exploiting context-based independence. We thank reviewer 2 for helpful suggestions to improve the clarity of the paper by including further intuitions behind the proofs. We will do so in the revised version.

Reviewer3: Thank you for your constructive feedback. We will update the figures to be able to differentiate curves in black-and-white. We will further illustrate the conditions in Definition 2 in the revised version. The conditions are used in the proof of theorem 5. Empirically, we observe functions with small gaps also demonstrate weight concentration. This suggests that there is room to further generalize theorem 5. We are actively looking into this. Indeed, we are working on learning graphical models with bounded Fourier representation. The Fourier form provides a general compact knowledge representation, which should have significant applications beyond probabilistic model counting.