Timezone: »
Recently, neural architectures with all Multi-layer Perceptrons (MLPs) have attracted great research interest from the computer vision community. However, the inefficient mixing of spatial-channel information causes MLP-like vision models to demand tremendous pre-training on large-scale datasets. This work solves the problem from a novel knowledge distillation perspective. We propose a novel Spatial-channel Token Distillation (STD) method, which improves the information mixing in the two dimensions by introducing distillation tokens to each of them. A mutual information regularization is further introduced to let distillation tokens focus on their specific dimensions and maximize the performance gain. Extensive experiments on ImageNet for several MLP-like architectures demonstrate that the proposed token distillation mechanism can efficiently improve the accuracy. For example, the proposed STD boosts the top-1 accuracy of Mixer-S16 on ImageNet from 73.8% to 75.7% without any costly pre-training on JFT-300M. When applied to stronger architectures, e.g. CycleMLP-B1 and CycleMLP-B2, STD can still harvest about 1.1% and 0.5% accuracy gains, respectively.
Author Information
Yanxi Li (University of Sydney)
Xinghao Chen (Huawei Noah's Ark Lab)
Minjing Dong (The University of Sydney)
Yehui Tang (Peking University)
Yunhe Wang (Noah's Ark Lab, Huawei Technologies.)
Chang Xu (University of Sydney)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Spatial-Channel Token Distillation for Vision MLPs »
Thu. Jul 21st 08:35 -- 08:40 PM Room Ballroom 1 & 2
More from the Same Authors
-
2023 Poster: Dual Focal Loss for Calibration »
Linwei Tao · Minjing Dong · Chang Xu -
2023 Poster: PixelAsParam: A Gradient View on Diffusion Sampling with Guidance »
Anh-Dung Dinh · Daochang Liu · Chang Xu -
2022 Poster: Federated Learning with Positive and Unlabeled Data »
Xinyang Lin · Hanting Chen · Yixing Xu · Chao Xu · Xiaolin Gui · Yiping Deng · Yunhe Wang -
2022 Spotlight: Federated Learning with Positive and Unlabeled Data »
Xinyang Lin · Hanting Chen · Yixing Xu · Chao Xu · Xiaolin Gui · Yiping Deng · Yunhe Wang -
2021 Poster: Commutative Lie Group VAE for Disentanglement Learning »
Xinqi Zhu · Chang Xu · Dacheng Tao -
2021 Oral: Commutative Lie Group VAE for Disentanglement Learning »
Xinqi Zhu · Chang Xu · Dacheng Tao -
2021 Poster: Learning to Weight Imperfect Demonstrations »
Yunke Wang · Chang Xu · Bo Du · Honglak Lee -
2021 Poster: K-shot NAS: Learnable Weight-Sharing for NAS with K-shot Supernets »
Xiu Su · Shan You · Mingkai Zheng · Fei Wang · Chen Qian · Changshui Zhang · Chang Xu -
2021 Spotlight: K-shot NAS: Learnable Weight-Sharing for NAS with K-shot Supernets »
Xiu Su · Shan You · Mingkai Zheng · Fei Wang · Chen Qian · Changshui Zhang · Chang Xu -
2021 Spotlight: Learning to Weight Imperfect Demonstrations »
Yunke Wang · Chang Xu · Bo Du · Honglak Lee -
2021 Poster: Winograd Algorithm for AdderNet »
Wenshuo Li · Hanting Chen · Mingqiang Huang · Xinghao Chen · Chunjing Xu · Yunhe Wang -
2021 Spotlight: Winograd Algorithm for AdderNet »
Wenshuo Li · Hanting Chen · Mingqiang Huang · Xinghao Chen · Chunjing Xu · Yunhe Wang -
2020 Poster: Neural Architecture Search in A Proxy Validation Loss Landscape »
Yanxi Li · Minjing Dong · Yunhe Wang · Chang Xu -
2020 Poster: Training Binary Neural Networks through Learning with Noisy Supervision »
Kai Han · Yunhe Wang · Yixing Xu · Chunjing Xu · Enhua Wu · Chang Xu -
2019 Poster: LegoNet: Efficient Convolutional Neural Networks with Lego Filters »
Zhaohui Yang · Yunhe Wang · Chuanjian Liu · Hanting Chen · Chunjing Xu · Boxin Shi · Chao Xu · Chang Xu -
2019 Oral: LegoNet: Efficient Convolutional Neural Networks with Lego Filters »
Zhaohui Yang · Yunhe Wang · Chuanjian Liu · Hanting Chen · Chunjing Xu · Boxin Shi · Chao Xu · Chang Xu -
2017 Poster: Beyond Filters: Compact Feature Map for Portable Deep Model »
Yunhe Wang · Chang Xu · Chao Xu · Dacheng Tao -
2017 Talk: Beyond Filters: Compact Feature Map for Portable Deep Model »
Yunhe Wang · Chang Xu · Chao Xu · Dacheng Tao