Timezone: »

Are Large Kernels Better Teachers than Transformers for ConvNets?
Tianjin Huang · Lu Yin · Zhenyu Zhang · Li Shen · Meng Fang · Mykola Pechenizkiy · Zhangyang “Atlas” Wang · Shiwei Liu

Wed Jul 26 02:00 PM -- 03:30 PM (PDT) @ Exhibit Hall 1 #208

This paper reveals a new appeal of the recently emerged large-kernel Convolutional Neural Networks (ConvNets): as the teacher in Knowledge Distillation (KD) for small-kernel ConvNets. While Transformers have led state-of-the-art (SOTA) performance in various fields with ever-larger models and labeled data, small-kernel ConvNets are considered more suitable for resource-limited applications due to the efficient convolution operation and compact weight sharing. KD is widely used to boost the performance of small-kernel ConvNets. However, previous research shows that it is not quite effective to distill knowledge (e.g., global information) from Transformers to small-kernel ConvNets, presumably due to their disparate architectures. We hereby carry out a first-of-its-kind study unveiling that modern large-kernel ConvNets, a compelling competitor to Vision Transformers, are remarkably more effective teachers for small-kernel ConvNets, due to more similar architectures. Our findings are backed up by extensive experiments on both logit-level and feature-level KD "out of the box", with no dedicated architectural nor training recipe modifications. Notably, we obtain the best-ever pure ConvNet under 30M parameters with 83.1% top-1 accuracy on ImageNet, outperforming current SOTA methods including ConvNeXt V2 and Swin V2. We also find that beneficial characteristics of large-kernel ConvNets, e.g., larger effective receptive fields, can be seamlessly transferred to students through this large-to-small kernel distillation. Code is available at: https://github.com/VITA-Group/SLaK.

Author Information

Tianjin Huang (Eindhoven University of Technology)
Lu Yin (Eindhoven University of Technology)
Zhenyu Zhang (University of Texas at Austin)
Li Shen (JD Explore Academy)
Meng Fang (University of Liverpool)
Mykola Pechenizkiy (TU Eindhoven)
Zhangyang “Atlas” Wang (University of Texas at Austin)
Shiwei Liu (UT Austin)

Shiwei Liu is a Postdoctoral Fellow at the University of Texas at Austin. He obtained his Ph.D. from the Eindhoven University of Technology in 2022. His research interests cover sparsity in neural networks and efficient ML. He has over 30 publications in top-tier machine learning conferences, such as IJCAI, ICLR, ICML, NeurIPS, IJCV, UAI, and LoG. Shiwei won the best paper award at the LoG’22 conference and the Cum Laude (distinguished Ph.D. thesis) at the Eindhoven University of Technology. He has served as an area chair in ICIP‘22 and ICIP’23; and a PC member of almost all top-tier ML/CV conferences. Shiwei has co-organized two tutorials in IJCAI and ECML-PKDD, which were widely acclaimed by the audience. He has also given more than 20 invited talks at many universities, companies, research labs, and conferences.

More from the Same Authors