Timezone: »

 
Spotlight
Understanding The Robustness in Vision Transformers
Zhou Daquan · Zhiding Yu · Enze Xie · Chaowei Xiao · Animashree Anandkumar · Jiashi Feng · Jose M. Alvarez

Tue Jul 19 11:55 AM -- 12:00 PM (PDT) @ Hall F

Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of an explanatory framework towards a more systematic understanding. In this paper, we examine the role of self-attention in learning robust representations. Our study is motivated by the intriguing properties of self-attention in visual grouping which indicate that self-attention could promote improved mid-level representation and robustness. We thus propose a family of fully attentional networks (FANs) that incorporate self-attention in both token mixing and channel processing. We validate the design comprehensively on various hierarchical backbones. Our model with a DeiT architecture achieves a state-of-the-art 47.6% mCE on ImageNet-C with 29M parameters. We also demonstrate significantly improved robustness in two downstream tasks: semantic segmentation and object detection

Author Information

Zhou Daquan (National University of Singapore, Insititute of Data Science, Learning and Vision Lab)
Zhiding Yu (NVIDIA)

Zhiding Yu is a Senior Research Scientist at NVIDIA. Before joining NVIDIA in 2018, he received Ph.D. in ECE from Carnegie Mellon University in 2017, and M.Phil. in ECE from The Hong Kong University of Science and Technology in 2012. His research interests mainly focus on deep representation learning, weakly/self-supervised learning, transfer learning and deep structured prediction, with their applications to vision and robotics problems.

Enze Xie (The University of Hong Kong)
Chaowei Xiao (University of Michigan)
Animashree Anandkumar (Caltech and NVIDIA)
Jiashi Feng (ByteDance)
Jose M. Alvarez (Nvidia)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors