Timezone: »

 
Oral
Image Transformer
Niki Parmar · Ashish Vaswani · Jakob Uszkoreit · Lukasz Kaiser · Noam Shazeer · Alexander Ku · Dustin Tran

Thu Jul 12 06:20 AM -- 06:30 AM (PDT) @ Victoria

Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. By restricting the self-attention mechanism to attend to local neighborhoods we significantly increase the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks. While conceptually simple, our generative models significantly outperform the current state of the art in image generation on ImageNet, improving the best published negative log-likelihood on ImageNet from 3.83 to 3.77.We also present results on image super-resolution with a large magnification ratio, applying an encoder-decoder configuration of our architecture. In a human evaluation study, we find that images generated by our super-resolution model fool human observers three times more often than the previous state of the art.

Author Information

Niki Parmar (Google)
Ashish Vaswani (Google Brain)
Jakob Uszkoreit
Lukasz Kaiser (Google)
Noam Shazeer (Google)
Alexander Ku (UC Berkeley)
Dustin Tran (Google)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors

  • 2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
    Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · Jie Ren · Joost van Amersfoort · Kehang Han · E. Kelly Buchanan · Kevin Murphy · Mark Collier · Mike Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani
  • 2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
    Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani
  • 2023 Poster: Scaling Vision Transformers to 22 Billion Parameters »
    Mostafa Dehghani · Josip Djolonga · Basil Mustafa · Piotr Padlewski · Jonathan Heek · Justin Gilmer · Andreas Steiner · Mathilde Caron · Robert Geirhos · Ibrahim Alabdulmohsin · Rodolphe Jenatton · Lucas Beyer · Michael Tschannen · Anurag Arnab · Xiao Wang · Carlos Riquelme · Matthias Minderer · Joan Puigcerver · Utku Evci · Manoj Kumar · Sjoerd van Steenkiste · Gamaleldin Elsayed · Aravindh Mahendran · Fisher Yu · Avital Oliver · Fantine Huot · Jasmijn Bastings · Mark Collier · Alexey Gritsenko · Vighnesh N Birodkar · Cristina Vasconcelos · Yi Tay · Thomas Mensink · Alexander Kolesnikov · Filip Pavetic · Dustin Tran · Thomas Kipf · Mario Lucic · Xiaohua Zhai · Daniel Keysers · Jeremiah Harmsen · Neil Houlsby
  • 2023 Poster: A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models »
    James Allingham · JIE REN · Michael Dusenberry · Xiuye Gu · Yin Cui · Dustin Tran · Jeremiah Liu · Balaji Lakshminarayanan
  • 2023 Oral: Scaling Vision Transformers to 22 Billion Parameters »
    Mostafa Dehghani · Josip Djolonga · Basil Mustafa · Piotr Padlewski · Jonathan Heek · Justin Gilmer · Andreas Steiner · Mathilde Caron · Robert Geirhos · Ibrahim Alabdulmohsin · Rodolphe Jenatton · Lucas Beyer · Michael Tschannen · Anurag Arnab · Xiao Wang · Carlos Riquelme · Matthias Minderer · Joan Puigcerver · Utku Evci · Manoj Kumar · Sjoerd van Steenkiste · Gamaleldin Elsayed · Aravindh Mahendran · Fisher Yu · Avital Oliver · Fantine Huot · Jasmijn Bastings · Mark Collier · Alexey Gritsenko · Vighnesh N Birodkar · Cristina Vasconcelos · Yi Tay · Thomas Mensink · Alexander Kolesnikov · Filip Pavetic · Dustin Tran · Thomas Kipf · Mario Lucic · Xiaohua Zhai · Daniel Keysers · Jeremiah Harmsen · Neil Houlsby
  • 2022 : Plex: Towards Reliability using Pretrained Large Model Extensions »
    Dustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani
  • 2021 : Uncertainty Modeling from 50M to 1B »
    Dustin Tran
  • 2021 Tutorial: Self-Attention for Computer Vision »
    Aravind Srinivas · Prajit Ramachandran · Ashish Vaswani
  • 2021 : Self-Attention for Computer Vision »
    Ashish Vaswani
  • 2020 Poster: Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors »
    Mike Dusenberry · Ghassen Jerfel · Yeming Wen · Yian Ma · Jasper Snoek · Katherine Heller · Balaji Lakshminarayanan · Dustin Tran
  • 2019 Poster: Area Attention »
    Yang Li · Lukasz Kaiser · Samy Bengio · Si Si
  • 2019 Oral: Area Attention »
    Yang Li · Lukasz Kaiser · Samy Bengio · Si Si
  • 2018 Poster: Fast Decoding in Sequence Models Using Discrete Latent Variables »
    Lukasz Kaiser · Samy Bengio · Aurko Roy · Ashish Vaswani · Niki Parmar · Jakob Uszkoreit · Noam Shazeer
  • 2018 Oral: Fast Decoding in Sequence Models Using Discrete Latent Variables »
    Lukasz Kaiser · Samy Bengio · Aurko Roy · Ashish Vaswani · Niki Parmar · Jakob Uszkoreit · Noam Shazeer
  • 2018 Poster: Adafactor: Adaptive Learning Rates with Sublinear Memory Cost »
    Noam Shazeer · Mitchell Stern
  • 2018 Oral: Adafactor: Adaptive Learning Rates with Sublinear Memory Cost »
    Noam Shazeer · Mitchell Stern