Paper ID: 871 Title: Exploiting Cyclic Symmetry in Convolutional Neural Networks Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper discusses types of layers that can be introduced into a convolutional neural networks in order to make them partially equivariant to rotations. Clarity - Justification: The motivation and solution idea are presented in a simple and clear manner. The paper provides a good overview of existing and related work. The methods are clearly presented, especially the notions of symmetry and equivariance. Significance - Justification: The paper includes experimental results on interesting data sets, which demonstrate that the proposed methods can be helpful in data sets with rotational symmetry. The paper focuses on rotations by multiples of pi/2, which is an important type of symmetries as demonstrated in the experiments. In my opinion the contribution would be more insightful and interesting by extending the discussion and experiments to rotations by angles different from multiples of pi/2. These cases seem to be excluded too fast just on grounds of requiring interpolation. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The proposed methods are interesting and well presented. I miss a more detailed discussion on how these methods actually contribute to solving the rotation invariance problem, as opposed to having data augmentation. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This work proposes a method of encoding certain kinds of symmetries into convolutional neural networks (CNN), restricting the learnable models to be equivariant under symmetry transformations. The symmetries considered are rotation by multiples of 90 degrees and reflection in the x and y axes. New kinds of CNN layers are proposed that can be used to encode these symmetries into the structure of the network. Clarity - Justification: The paper is not hard to follow. Significance - Justification: I think this paper is quite lacking in significance. As the authors' own literature review shows, this area of CNN design has been very thoroughly explored. In particular, the idea of symmetry is not new, nor its encoding into network architectures, nor even the use of rotational and mirror symmetry. At best, the contribution of the paper is to explore a particular specialized point in design space when the space as a whole has been generally characterized. In other words, the use of this sort of simple discrete symmetry would not be a surprise to any practitioner, and any such practitioner would come up with a very similar network architecture to what is proposed. The interesting advances to be made are to use non-trivial symmetries, starting with something as simple as rotations by arbitrary angles (and moving on, perhaps, to arbitrary scaling and affine transformations). The work of Gens and Domingos (2014) cited in the paper goes in this direction, but a lot more remains to be done. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): To summarize, I think the paper is fairly clear and easy to read, but is not significant or novel enough for ICML. In general, groups that permute the input pixels of an image in such a uniform way as rotation, mirroring, or indeed, translation are trivial to encode as symmetries of a convolutional network. The particular data layout used to represent them follows very naturally, and would almost be considered an implementation detail. Unfortunately, that is the only aspect of this paper that represented a concrete technical suggestion that was also (arguably) novel. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper presents a method for building convolutional neural networks that are invariant or equivarient to 90 degree rotations of the input data. Traditional CNNs are augmented with cyclic slicing, pooling, rolling, and stacking layers that operate over the minibatch dimension of feature maps; these layers are easy to implement and insert into existing architectures. Experiments on three benchmark datasets confirm the utility of the proposed method. Clarity - Justification: The paper is well-written and easy to follow. Significance - Justification: The proposed method is significant because it can easily be applied to convolutional networks of any architecture when rotational invariance or equivariance are desired. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is well-written, the method is clean, simple, and general; however I have some concerns about the experiments. I appreciate the error bars on all experiments; are these a result of different train / val splits or simply different random initializations on the same split? Although the text discusses slice, pool, roll, and stack layers, it appears that no experiments are performed using stack layers. Is there some justification for this? In all experiments, it appears that cyclic slicing is only used at the input and cyclic pooling just before the output; in the absence of rolling layers, this seems very similar to training a traditional CNN on minibatches of rotated inputs, except that predictions are averaged across rotations. Are there benefits of inserting slicing and pooling layers at internal layers of the network? Did you perform any experiments to this effect? The Kaggle datasets used for experiments are slightly nonstandard; this is fine, but the authors do not cite any prior work on these datasets to show how their approach compares to existing state-of-the-art. Given that these datasets are from Kaggle competitions, how does the proposed method compare to the top-performing contest entries? =====