We thank the reviewers for their time and feedback.  Comments addressed below per assigned reviewer.$

*** Assigned_Reviewer_1 ***

Comment 1
-------------
The formal definitions of a tensor product and a tensor require an abstract algebraic discussion - see sec 3.2 in the book by Hackbusch ([A]).  For our purposes, it suffices to consider the special case of real multi-dimensional arrays and the operator defined in eq 1.  This reduction, formally justified by an isomorphism as presented in sec 3.2 of [A], resembles the reduction from a general vector space to R^n.  We use the term "tensor product" as opposed to "outer product" since the latter sometimes refers specifically to an operator between vectors in Euclidean space (1-dimensional arrays).

[A] Tensor Spaces and Numerical Tensor Calculus, W. Hackbusch, Feb 2012

Comment 2
-------------
There seems to be a misunderstanding.  The operator g() intakes two scalars, not tensors.  It is commutative by definition.  The generalized tensor product (defined in eq 2) makes use of g(), but the two operators are inherently different.  In particular, the generalized tensor product operates on tensors, and is not commutative.

Comment 5
-------------
Here too there is likely a misunderstanding.  Our entire analysis (sec 5) is based on the generalized tensor decompositions.  For example, when proving depth efficiency we typically show that tensors created by the generalized CP decomposition (shallow network) have linear matricization rank, whereas tensors created by the generalized HT decomposition (deep network) have matricization rank which is exponentially large.


*** Assigned_Reviewer_2 ***

We thank reviewer for the support.
With regards to patches, templates and grid tensors, we will integrate text from app A into sec 4 to explain these concepts in more detail.


*** Assigned_Reviewer_5 ***

We thank reviewer for the support and for the useful suggestions, which will be taken into account in the final version of the manuscript.
Some clarifications follow.

"Clarity - Justification" textbox
-------------------------------------
All claims presented in the paper are facts.  The only conjecture which is not completely proven relates to the incidence of depth efficiency under ReLU activation and max pooling.  This is mentioned in the text between claims 8 and 9, and discussed in detail in app D.

"Detailed comments" textbox
-----------------------------------

PARAGRAPH 1: 
Indeed, one may consider depth efficiency under non-universality.  The drawback of this approach is that in the typical case, at least in our framework, a non-universal shallow network cannot realize functions generated by an analogous deep network, no matter how large we allow it to be.  Arguably, this provides little insight into the representational power of depth, as it does not quantify the complexity of functions generated by a deep network.

PARAGRAPH 2: 
Our analysis begins with the setting of unshared weights because it is easier to follow and corresponds more directly to CP and HT decompositions.  We discuss the setting of shared weights briefly in sec 5.3, and in detail in app B.  Following reviewer's feedback, we will broaden the discussion in the body of the paper by integrating text from app B into sec 5.3.