Oral
Making Convolutional Networks Shift-Invariant Again
Richard Zhang
Modern convolutional networks are not shift-invariant, despite their convolutional nature: small shifts in the input can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, ignore the classical sampling theorem. The well-known fix is applying a low-pass filter before downsampling. However, previous work has assumed that including such anti-aliasing filter necessarily \textit{excludes} max-pooling. We show that when integrated correctly, these operations are in fact \textit{compatible}. The technique is general and can be incorporated across other layer types, such as average-pooling and strided-convolution, and applications, such as image classification and translation. In addition, engineering the inductive bias of shift-equivariance largely removes the need for shift-based data augmentation at training time. Our results demonstrate that this classical signal processing technique has been overlooked in modern networks.