Timezone: »
Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption.
In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data.
We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1 acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.
Author Information
Hugo Touvron (Facebook AI Research)
Matthieu Cord (Sorbonne University)
Douze Matthijs (Facebook AI Research)
Francisco Massa (Facebook AI Research)
Alexandre Sablayrolles (Facebook AI)
Herve Jegou (Facebook AI Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Training data-efficient image transformers & distillation through attention »
Thu. Jul 22nd 02:20 -- 02:25 PM Room
More from the Same Authors
-
2023 Poster: Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano »
Chuan Guo · Alexandre Sablayrolles · Maziar Sanjabi -
2023 Poster: TAN Without a Burn: Scaling Laws of DP-SGD »
Tom Sander · Pierre Stock · Alexandre Sablayrolles -
2021 Poster: ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases »
Stéphane d'Ascoli · Hugo Touvron · Matthew Leavitt · Ari Morcos · Giulio Biroli · Levent Sagun -
2021 Spotlight: ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases »
Stéphane d'Ascoli · Hugo Touvron · Matthew Leavitt · Ari Morcos · Giulio Biroli · Levent Sagun -
2020 Poster: Radioactive data: tracing through training »
Alexandre Sablayrolles · Douze Matthijs · Cordelia Schmid · Herve Jegou -
2019 Poster: White-box vs Black-box: Bayes Optimal Strategies for Membership Inference »
Alexandre Sablayrolles · Douze Matthijs · Cordelia Schmid · Yann Ollivier · Herve Jegou -
2019 Oral: White-box vs Black-box: Bayes Optimal Strategies for Membership Inference »
Alexandre Sablayrolles · Douze Matthijs · Cordelia Schmid · Yann Ollivier · Herve Jegou -
2017 Poster: Efficient softmax approximation for GPUs »
Edouard Grave · Armand Joulin · Moustapha Cisse · David Grangier · Herve Jegou -
2017 Talk: Efficient softmax approximation for GPUs »
Edouard Grave · Armand Joulin · Moustapha Cisse · David Grangier · Herve Jegou