We thank the reviewers for their comments.$
With regards to the significance of our work:

While we agree that this area of CNN design has been thoroughly explored already, and we have done our best to provide a fairly complete overview of this work in our related work section, we believe that our work is the first to demonstrate a network architecture that exhibits true equivariance (i.e. not approximate) by design, while also being practical to implement and scalable to large input data.

Most of the prior work in this area is either impractical to scale to realistically sized datasets (both in terms of number of examples, and the size of each individual example image), or does not ensure true equivariance. Hence, we do not feel that there is any prior work that sufficiently encompasses ours. We also feel that part of the merit of our work is to provide a simple framework that makes building rotation equivariant nets an accessible goal for everyday practitioners.

We wish to challenge that our architectural approach is trivial to come up with for any practitioner who wishes to exploit rotational symmetry. It is not straightforward to come up with and implement a specific set of operations that is guaranteed to preserve equivariance all the way through the network: when a particular layer at some point in the network relinquishes equivariance, it can never be recovered by any later layer in the network. The fact that our approach has not been published previously, despite the high level of activity in the field in the past four years, also attests to this.

We are very interested in generalizing our work to arbitrary rotation angles, and perhaps other types of transformations, but practical applicability was a very important criterion for us. As a consequence, we restricted ourselves to a set of transformations that can be computed quickly and efficiently (without requiring any interpolation / padding to align the feature maps). This restriction is key in making the resulting framework practically useful, which we feel is part of the merit of our work, as stated earlier. We will amend the paper to make our justification for this restriction more clear, and we do hope to explore this in future work.


With regards to experiments: 

The error bars on all experiments are the result of different random initializations (we will state this more clearly). For "plankton" and "galaxies", the nature of the datasets used meant that we could not choose the split ourselves: the test data was never released publicly, so evaluation is only possible through the Kaggle system itself. We chose to follow this evaluation protocol for the paper, to ensure that the reported scores are comparable to those of the competition participants. For the Massachusetts buildings dataset, a canonical train/valid/test split was also provided.

The stack layer is useful in a context where global equivariance is not desirable, so at some point in the network it should be relinquished. By simply stacking the feature maps obtained from the different orientations, the next layer will have different parameters for each and will not be equivariant. This could be desirable for natural image datasets for example, where global structure is not rotation equivariant (i.e. many objects and scenes have a canonical orientation due to gravity). We did not use it in our experiments because full equivariance is desirable for all three considered datasets. Nevertheless it was necessary to define the stacking operation, in order to be able to define the rolling operation in a straightforward manner.

While it does not really make sense to insert a slicing layer anywhere else than at the input of the network (because almost any type of layer placed before it would already relinquish equivariance), there is more flexibility w.r.t. the position of the pooling layer. It could be interesting to place it before all dense layers, for example. We have experimented with a few such configurations and found the one used in the paper to be the most effective -- unfortunately there was not enough room to include comparison experiments.

We did not cite much prior work about the Kaggle datasets we used, since there are very few papers about it (except Dieleman et al. 2015, which we did cite). In designing our baseline networks, we made sure to make them competitive: they would rank 12/327 (galaxies) and 57/1050 (plankton) on the leaderboards respectively. We feel that this is quite reasonable, considering that most participants used extensive model averaging to obtain their results.

We will amend the paper to clarify some of these points.