Timezone: »

Spatial-Channel Token Distillation for Vision MLPs
Yanxi Li · Xinghao Chen · Minjing Dong · Yehui Tang · Yunhe Wang · Chang Xu

Thu Jul 21 03:00 PM -- 05:00 PM (PDT) @ Hall E #421

Recently, neural architectures with all Multi-layer Perceptrons (MLPs) have attracted great research interest from the computer vision community. However, the inefficient mixing of spatial-channel information causes MLP-like vision models to demand tremendous pre-training on large-scale datasets. This work solves the problem from a novel knowledge distillation perspective. We propose a novel Spatial-channel Token Distillation (STD) method, which improves the information mixing in the two dimensions by introducing distillation tokens to each of them. A mutual information regularization is further introduced to let distillation tokens focus on their specific dimensions and maximize the performance gain. Extensive experiments on ImageNet for several MLP-like architectures demonstrate that the proposed token distillation mechanism can efficiently improve the accuracy. For example, the proposed STD boosts the top-1 accuracy of Mixer-S16 on ImageNet from 73.8% to 75.7% without any costly pre-training on JFT-300M. When applied to stronger architectures, e.g. CycleMLP-B1 and CycleMLP-B2, STD can still harvest about 1.1% and 0.5% accuracy gains, respectively.

Author Information

Yanxi Li (University of Sydney)
Xinghao Chen (Huawei Noah's Ark Lab)
Minjing Dong (The University of Sydney)
Yehui Tang (Peking University)
Yunhe Wang (Noah's Ark Lab, Huawei Technologies.)
Chang Xu (University of Sydney)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors