Timezone: »
The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al., 2022). We present a recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-22B) and perform a wide variety of experiments on the resulting model. When evaluated on downstream tasks (often with a lightweight linear model on frozen features), ViT-22B demonstrates increasing performance with scale. We further observe other interesting benefits of scale, including an improved tradeoff between fairness and performance, state-of-the-art alignment to human visual perception in terms of shape/texture bias, and improved robustness. ViT-22B demonstrates the potential for "LLM-like" scaling in vision, and provides key steps towards getting there.
Author Information
Mostafa Dehghani
Josip Djolonga (Google)
Basil Mustafa (Google)
Piotr Padlewski (Google Deepmind)
Jonathan Heek (Google)
Justin Gilmer (Google Brain)
Andreas Steiner (Google)

Computer vision research engineer at Google DeepMind. Previously worked in tropical medicine. Education background MD, bioelectronics.
Mathilde Caron (Google)
Robert Geirhos (Google DeepMind)
Ibrahim Alabdulmohsin (Google)
Rodolphe Jenatton (Google Research)
Lucas Beyer (Google Brain (Zürich))
Michael Tschannen (Google Brain)
Anurag Arnab (University of Oxford)
Xiao Wang (Google)
Carlos Riquelme (Google Brain)
Matthias Minderer (Google Research)
Joan Puigcerver (Google DeepMind)
Utku Evci (Google)
Manoj Kumar (Google Brain)
Sjoerd van Steenkiste (IDSIA)
Gamaleldin Elsayed (Google DeepMind)
Gamaleldin F. Elsayed is a Research Scientist at Google DeepMind interested in deep learning and computational neuroscience research. In particular, his research is focused on studying properties and problems of artificial neural networks and designing better machine learning models with inspiration from neuroscience. In 2017, he completed his PhD in Neuroscience from Columbia University at the Center for Theoretical Neuroscience. During his PhD, he contributed to the field of computational neuroscience through designing machine learning methods for identifying and validating structures in complex neural data. Prior to that, he completed his B.S. from The American University in Cairo with a major in Electronics Engineering and a minor in Computer Science, and earned M.S. degrees in electrical engineering from KAUST and Washington University in St. Louis. Before his Graduate studies, he was also a professional athlete and Olympian Fencer. He competed at The 2008 Olympic Games in Beijing with the Egyptian Saber team.
Aravindh Mahendran (Google)
Fisher Yu (ETH Zurich)
Avital Oliver (Bar Ilan University)
Fantine Huot (Google)
Jasmijn Bastings (Google)
Mark Collier (Google)
Alexey Gritsenko (Google)
Vighnesh N Birodkar (Google)
Cristina Vasconcelos
Yi Tay (Google)
Thomas Mensink (Google Research / University of Amsterdam)
Alexander Kolesnikov (Google Brain)
Filip Pavetic (Google)
Dustin Tran (Google Brain)
Thomas Kipf (Google DeepMind)
Mario Lucic (Google Brain)
Xiaohua Zhai (Google Brain)
Daniel Keysers (Google)
Jeremiah Harmsen (Google)
Jeremiah Harmsen joined Google in 2005 where he has founded efforts such as TensorFlow Hub, TensorFlow Serving and the Machine Learning Ninja Rotation. He focuses on creating the ideas, tools and people to help the world use machine learning. He currently leads the Applied Machine Intelligence group at Google AI Zurich. The team increases the impact of machine learning through consultancy, state-of-the-art infrastructure development, research and education. Jeremiah received a B.S. degree in electrical engineering and computer engineering (2001), a M.S. degree in electrical engineering (2003), a M.S. degree in mathematics (2005) and a Ph.D. in electrical engineering (2005) from Rensselaer Polytechnic Institute, Troy, NY. Jeremiah lives by the lake with his wife, son and daughter in Zurich, Switzerland.
Neil Houlsby (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 Oral: Scaling Vision Transformers to 22 Billion Parameters »
Wed. Jul 26th 03:46 -- 03:54 AM Room Meeting Room 313
More from the Same Authors
-
2022 : SI-Score »
Jessica Yung · Rob Romijnders · Alexander Kolesnikov · Lucas Beyer · Josip Djolonga · Neil Houlsby · Sylvain Gelly · Mario Lucic · Xiaohua Zhai -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · Jie Ren · Joost van Amersfoort · Kehang Han · E. Kelly Buchanan · Kevin Murphy · Mark Collier · Mike Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2023 : Don't trust your eyes: on the (un)reliability of feature visualizations »
Robert Geirhos · Roland S. Zimmermann · Blair Bilodeau · Wieland Brendel · Been Kim -
2023 : Three Towers: Flexible Contrastive Learning with Pretrained Image Models »
Jannik Kossen · Mark Collier · Basil Mustafa · Xiao Wang · Xiaohua Zhai · Lucas Beyer · Andreas Steiner · Jesse Berent · Rodolphe Jenatton · Efi Kokiopoulou -
2023 Poster: BiBench: Benchmarking and Analyzing Network Binarization »
Haotong Qin · Mingyuan Zhang · Yifu Ding · Aoyu Li · Zhongang Cai · Ziwei Liu · Fisher Yu · Xianglong Liu -
2023 Poster: Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective »
Michael Sander · Joan Puigcerver · Josip Djolonga · Gabriel Peyré · Mathieu Blondel -
2023 Poster: simple diffusion: End-to-end diffusion for high resolution images »
Emiel Hoogeboom · Jonathan Heek · Tim Salimans -
2023 Poster: Underspecification Presents Challenges for Credibility in Modern Machine Learning »
Alexander D'Amour · Katherine Heller · Dan Moldovan · Ben Adlam · Babak Alipanahi · Alex Beutel · Christina Chen · Jonathan Deaton · Jacob Eisenstein · Matthew Hoffman · Farhad Hormozdiari · Neil Houlsby · Shaobo Hou · Ghassen Jerfel · Alan Karthikesalingam · Mario Lucic · Yian Ma · Cory McLean · Diana Mincu · Akinori Mitani · Andrea Montanari · Zachary Nado · Vivek Natarajan · Christopher Nielson · Thomas F. Osborne · Rajiv Raman · Kim Ramasamy · Rory sayres · Jessica Schrouff · Martin Seneviratne · Shannon Sequeira · Harini Suresh · Victor Veitch · Maksym Vladymyrov · Xuezhi Wang · Kellie Webster · Steve Yadlowsky · Taedong Yun · Xiaohua Zhai · D. Sculley -
2023 Poster: Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames »
Ondrej Biza · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Gamaleldin Elsayed · Aravindh Mahendran · Thomas Kipf -
2023 Poster: Tuning Computer Vision Models With Task Rewards »
André Susano Pinto · Alexander Kolesnikov · Yuge Shi · Lucas Beyer · Xiaohua Zhai -
2023 Poster: Adaptive Computation with Elastic Input Sequence »
Fuzhao Xue · Valerii Likhosherstov · Anurag Arnab · Neil Houlsby · Mostafa Dehghani · Yang You -
2023 Poster: The Flan Collection: Designing Data and Methods for Effective Instruction Tuning »
Shayne Longpre · Le Hou · Tu Vu · Albert Webson · Hyung Won Chung · Yi Tay · Denny Zhou · Quoc Le · Barret Zoph · Jason Wei · Adam Roberts -
2023 Poster: When does Privileged information Explain Away Label Noise? »
Guillermo Ortiz Jimenez · Mark Collier · Anant Nawalgaria · Alexander D'Amour · Jesse Berent · Rodolphe Jenatton · Efi Kokiopoulou -
2023 Poster: A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models »
James Allingham · JIE REN · Michael Dusenberry · Xiuye Gu · Yin Cui · Dustin Tran · Jeremiah Liu · Balaji Lakshminarayanan -
2023 Poster: The Dormant Neuron Phenomenon in Deep Reinforcement Learning »
Ghada Sokar · Rishabh Agarwal · Pablo Samuel Castro · Utku Evci -
2023 Oral: The Dormant Neuron Phenomenon in Deep Reinforcement Learning »
Ghada Sokar · Rishabh Agarwal · Pablo Samuel Castro · Utku Evci -
2023 Poster: Test-time Adaptation with Slot-Centric Models »
Mihir Prabhudesai · Anirudh Goyal · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Gaurav Aggarwal · Thomas Kipf · Deepak Pathak · Katerina Fragkiadaki -
2023 Tutorial: Self-Supervised Learning in Vision: from Research Advances to Best Practices »
Xinlei Chen · Ishan Misra · Randall Balestriero · Mathilde Caron · Christoph Feichtenhofer · Mark Ibrahim -
2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani -
2022 : SI-Score »
Jessica Yung · Rob Romijnders · Alexander Kolesnikov · Lucas Beyer · Josip Djolonga · Neil Houlsby · Sylvain Gelly · Mario Lucic · Xiaohua Zhai -
2022 : Dynamic neural networks: Present and Future »
Neil Houlsby -
2022 Poster: On the Practicality of Deterministic Epistemic Uncertainty »
Janis Postels · Mattia Segù · Tao Sun · Luca Daniel Sieber · Luc Van Gool · Fisher Yu · Federico Tombari -
2022 Poster: Transfer and Marginalize: Explaining Away Label Noise with Privileged Information »
Mark Collier · Rodolphe Jenatton · Efi Kokiopoulou · Jesse Berent -
2022 Spotlight: On the Practicality of Deterministic Epistemic Uncertainty »
Janis Postels · Mattia Segù · Tao Sun · Luca Daniel Sieber · Luc Van Gool · Fisher Yu · Federico Tombari -
2022 Spotlight: Transfer and Marginalize: Explaining Away Label Noise with Privileged Information »
Mark Collier · Rodolphe Jenatton · Efi Kokiopoulou · Jesse Berent -
2021 : Uncertainty Modeling from 50M to 1B »
Dustin Tran -
2021 Poster: Neural Feature Matching in Implicit 3D Representations »
Yunlu Chen · Basura Fernando · Hakan Bilen · Thomas Mensink · Efstratios Gavves -
2021 Spotlight: Neural Feature Matching in Implicit 3D Representations »
Yunlu Chen · Basura Fernando · Hakan Bilen · Thomas Mensink · Efstratios Gavves -
2020 : Keynote #5 Justin Gilmer »
Justin Gilmer -
2020 : Attentive Grouping and Graph Neural Networks for Object-Centric Learning »
Thomas Kipf -
2020 Workshop: Object-Oriented Learning: Perception, Representation, and Reasoning »
Sungjin Ahn · Adam Kosiorek · Jessica Hamrick · Sjoerd van Steenkiste · Yoshua Bengio -
2020 : Invited Talk: Thomas Kipf »
Thomas Kipf -
2020 Poster: The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks »
Jakub Swiatkowski · Kevin Roth · Bastiaan Veeling · Linh Tran · Joshua V Dillon · Jasper Snoek · Stephan Mandt · Tim Salimans · Rodolphe Jenatton · Sebastian Nowozin -
2020 Poster: Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors »
Mike Dusenberry · Ghassen Jerfel · Yeming Wen · Yian Ma · Jasper Snoek · Katherine Heller · Balaji Lakshminarayanan · Dustin Tran -
2020 Poster: Frustratingly Simple Few-Shot Object Detection »
Xin Wang · Thomas Huang · Joseph E Gonzalez · Trevor Darrell · Fisher Yu -
2020 Poster: Weakly-Supervised Disentanglement Without Compromises »
Francesco Locatello · Ben Poole · Gunnar Ratsch · Bernhard Schölkopf · Olivier Bachem · Michael Tschannen -
2020 Poster: Revisiting Spatial Invariance with Low-Rank Local Connectivity »
Gamaleldin Elsayed · Prajit Ramachandran · Jon Shlens · Simon Kornblith -
2020 Poster: Automatic Shortcut Removal for Self-Supervised Representation Learning »
Matthias Minderer · Olivier Bachem · Neil Houlsby · Michael Tschannen -
2020 Poster: How Good is the Bayes Posterior in Deep Neural Networks Really? »
Florian Wenzel · Kevin Roth · Bastiaan Veeling · Jakub Swiatkowski · Linh Tran · Stephan Mandt · Jasper Snoek · Tim Salimans · Rodolphe Jenatton · Sebastian Nowozin -
2019 : Fisher Yu: "Motion and Prediction for Autonomous Driving" »
Fisher Yu · Trevor Darrell -
2019 : poster session I »
Nicholas Rhinehart · Yunhao Tang · Vinay Prabhu · Dian Ang Yap · Alexander Wang · Marc Finzi · Manoj Kumar · You Lu · Abhishek Kumar · Qi Lei · Michael Przystupa · Nicola De Cao · Polina Kirichenko · Pavel Izmailov · Andrew Wilson · Jakob Kruse · Diego Mesquita · Mario Lezcano Casado · Thomas Müller · Keir Simmons · Andrei Atanov -
2019 Workshop: Learning and Reasoning with Graph-Structured Representations »
Ethan Fetaya · Zhiting Hu · Thomas Kipf · Yujia Li · Xiaodan Liang · Renjie Liao · Raquel Urtasun · Hao Wang · Max Welling · Eric Xing · Richard Zemel -
2019 : Poster Session 1 (all papers) »
Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel -
2019 Workshop: Uncertainty and Robustness in Deep Learning »
Sharon Yixuan Li · Dan Hendrycks · Thomas Dietterich · Balaji Lakshminarayanan · Justin Gilmer -
2019 Poster: Adversarial Examples Are a Natural Consequence of Test Error in Noise »
Justin Gilmer · Nicolas Ford · Nicholas Carlini · Ekin Dogus Cubuk -
2019 Poster: A Large-Scale Study on Regularization and Normalization in GANs »
Karol Kurach · Mario Lucic · Xiaohua Zhai · Marcin Michalski · Sylvain Gelly -
2019 Oral: A Large-Scale Study on Regularization and Normalization in GANs »
Karol Kurach · Mario Lucic · Xiaohua Zhai · Marcin Michalski · Sylvain Gelly -
2019 Oral: Adversarial Examples Are a Natural Consequence of Test Error in Noise »
Justin Gilmer · Nicolas Ford · Nicholas Carlini · Ekin Dogus Cubuk -
2019 Poster: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations »
Francesco Locatello · Stefan Bauer · Mario Lucic · Gunnar Ratsch · Sylvain Gelly · Bernhard Schölkopf · Olivier Bachem -
2019 Poster: High-Fidelity Image Generation With Fewer Labels »
Mario Lucic · Michael Tschannen · Marvin Ritter · Xiaohua Zhai · Olivier Bachem · Sylvain Gelly -
2019 Oral: High-Fidelity Image Generation With Fewer Labels »
Mario Lucic · Michael Tschannen · Marvin Ritter · Xiaohua Zhai · Olivier Bachem · Sylvain Gelly -
2019 Oral: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations »
Francesco Locatello · Stefan Bauer · Mario Lucic · Gunnar Ratsch · Sylvain Gelly · Bernhard Schölkopf · Olivier Bachem -
2018 Poster: Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) »
Been Kim · Martin Wattenberg · Justin Gilmer · Carrie Cai · James Wexler · Fernanda Viégas · Rory sayres -
2018 Poster: StrassenNets: Deep Learning with a Multiplication Budget »
Michael Tschannen · Aran Khanna · Animashree Anandkumar -
2018 Poster: Born Again Neural Networks »
Tommaso Furlanello · Zachary Lipton · Michael Tschannen · Laurent Itti · Anima Anandkumar -
2018 Oral: Born Again Neural Networks »
Tommaso Furlanello · Zachary Lipton · Michael Tschannen · Laurent Itti · Anima Anandkumar -
2018 Oral: StrassenNets: Deep Learning with a Multiplication Budget »
Michael Tschannen · Aran Khanna · Animashree Anandkumar -
2018 Oral: Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) »
Been Kim · Martin Wattenberg · Justin Gilmer · Carrie Cai · James Wexler · Fernanda Viégas · Rory sayres -
2018 Poster: Image Transformer »
Niki Parmar · Ashish Vaswani · Jakob Uszkoreit · Lukasz Kaiser · Noam Shazeer · Alexander Ku · Dustin Tran -
2018 Oral: Image Transformer »
Niki Parmar · Ashish Vaswani · Jakob Uszkoreit · Lukasz Kaiser · Noam Shazeer · Alexander Ku · Dustin Tran -
2018 Poster: Neural Relational Inference for Interacting Systems »
Thomas Kipf · Ethan Fetaya · Kuan-Chieh Wang · Max Welling · Richard Zemel -
2018 Oral: Neural Relational Inference for Interacting Systems »
Thomas Kipf · Ethan Fetaya · Kuan-Chieh Wang · Max Welling · Richard Zemel -
2017 Poster: Neural Message Passing for Quantum Chemistry »
Justin Gilmer · Samuel Schoenholz · Patrick F Riley · Oriol Vinyals · George Dahl -
2017 Talk: Neural Message Passing for Quantum Chemistry »
Justin Gilmer · Samuel Schoenholz · Patrick F Riley · Oriol Vinyals · George Dahl -
2017 Poster: Input Switched Affine Networks: An RNN Architecture Designed for Interpretability »
Jakob Foerster · Justin Gilmer · Jan Chorowski · Jascha Sohl-Dickstein · David Sussillo -
2017 Talk: Input Switched Affine Networks: An RNN Architecture Designed for Interpretability »
Jakob Foerster · Justin Gilmer · Jan Chorowski · Jascha Sohl-Dickstein · David Sussillo