We thank the reviewers for their comments. We first discuss the Kimmig et al. technical report and then address each reviewer directly.$
The main result of Kimmig et al. (Thm. 2) states that semiring-generalized circuits that are both decomposable AND deterministic can be summed in time linear in their size. Any circuit that is deterministic and decomposable will in general be exponentially larger than a decomposable circuit that computes that same function [Darwiche &amp; Marquis 2002]. This is what we mean by their results being exponentially weaker. Further, it appears that their formulation is not equivalent to ours, since they give a simple example where they show that determinism is necessary for sound probabilistic inference, whereas the sum-product theorem (and all work on SPNs) proves that it is not necessary. Thm. 3 shows that determinism can be relaxed if the sum is idempotent, but this just replaces one unnecessary strong restriction with another that does not hold for many interesting domains (such as probabilistic inference). Finally, Kimmig et al. does not deal with learning and there is only partial overlap between the domains considered by each paper.

Reviewer 1.
We have limited space so will discuss only the most glaring inaccuracies.

This is a very inaccurate summary of the relevant literature. Arithmetic circuits (ACs) are a formalism used for decades by theorists to model computation (see references in Shpilka &amp; Yehudayoff 2010). Darwiche (2002) used ACs as a data structure for inference in Bayesian networks (BNs). Building on this, Poon &amp; Domingos (2011) introduced SPNs, a new class of probabilistic model. They were the first to establish tractability conditions for probabilistic models of this type. None were previously defined for Darwiche-ACs, which were solely a compilation target for BNs, and not a representation in their own right. Poon &amp; Domingos were the first to learn SPNs; Lowd &amp; Domingos (2008) only learned BNs regularized by the size of the compiled AC. SPNs are thus the most relevant model to build from.

Rooshenas &amp; Lowd (2014) only show equivalence between discrete SPNs and discrete Darwiche-ACs that satisfy the SPN tractability conditions (Darwiche-ACs are otherwise intractable for summation); Darwiche-ACs on continuous variables by definition require an infinite number of leaves - your comment does not solve this. Contrary to what the review states, R&amp;L in no way show that translation to smooth deterministic ACs can be done in linear time. Again, this takes exponential time [Darwiche &amp; Marquis 2002] (the difference is similar to probabilistic models with/without hidden variables).

Our goal is not to go into fine detail on these algorithms, as many previous papers have, but to show the commonalities and provide a unification across domains, thus making it possible to apply them to new problems.

The sum-product theorem (SPT) holds; the proof is not complicated, but the result is powerful. The SPF + BDD semiring definition is incomplete, so it is difficult to know exactly what is meant, but it seems unlikely to satisfy the properties of a semiring or the conditions of the SPT, since 1) BDDs are not decomposable (they are deterministic) so the SPT does not apply and 2) the SPT is about summing a decomposable SPF, not about evaluating one nor about transforming circuits. Further, if the semiring is on the set of all BDDs then the leaves must be univariate functions (on an unspecified domain) that output BDDs. Evaluating this SPF outputs a BDD composed from the leaf BDDs. It is not clear that summing this SPF is meaningful but it certainly does not require converting DNNF to d-DNNF as you state.

We discuss many papers containing tractable high-treewidth inference and in no way claim to be the first; instead, we present a general unified framework that encompasses this existing work and some novel extensions. Our paper is novel and significant in the same way that Dechter’s "Bucket elimination: A unifying framework for reasoning" is: it did not invent variable elimination, but it greatly increased its scope.

Reviewer 2.
We know of proofs of the tractability of CSPs compiled into d-DNNF but have not seen a proof that decomposability is sufficient (as in Cor. 4), although it would not surprise us if one exists; if you know a citation we’re happy to include it.

Only connections between those product nodes ci that correspond to cliques that are connected (in the JT) to the separator for sij are created. We will clarify this in the paper.

Reviewer 3.
The arg- interpretation only applies to operators for which it is meaningful; as you note, sum is not one of these.

For SAT, one example is improving existing rule learning systems. These currently just learn large KBs of arbitrary rules and hope that inference will be tractable. In comparison, LearnSPF would learn a KB of rules structured so that inference is tractable.