Timezone: »
Biological systems understand the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models. These priors introduce helpful inductive biases, but also lock models to individual modalities. In this paper we introduce the Perceiver – a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets. The model leverages an asymmetric attention mechanism to iteratively distill inputs into a tight latent bottleneck, allowing it to scale to handle very large inputs. We show that this architecture is competitive with or outperforms strong, specialized models on classification tasks across various modalities: images, point clouds, audio, video and video+audio. The Perceiver obtains performance comparable to ResNet-50 and ViT on ImageNet without 2D convolutions by directly attending to 50,000 pixels. It is also competitive in all modalities in AudioSet.
Author Information
Drew Jaegle (DeepMind)
Felix Axel Gimeno Gil (DeepMind)
Studied two bachelor degrees (one in Mathematics, the other in Computer Science) at the Technical University of Catalonia (Spain). Now working as a Resarch Engineer as Google DeepMind, London, UK.
Andy Brock (DeepMind)
Oriol Vinyals (Google DeepMind)
Oriol Vinyals is a Research Scientist at Google. He works in deep learning with the Google Brain team. Oriol holds a Ph.D. in EECS from University of California, Berkeley, and a Masters degree from University of California, San Diego. He is a recipient of the 2011 Microsoft Research PhD Fellowship. He was an early adopter of the new deep learning wave at Berkeley, and in his thesis he focused on non-convex optimization and recurrent neural networks. At Google Brain he continues working on his areas of interest, which include artificial intelligence, with particular emphasis on machine learning, language, and vision.
Andrew Zisserman (Oxford University & Google DeepMind)
Joao Carreira (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Perceiver: General Perception with Iterative Attention »
Tue. Jul 20th 02:25 -- 02:30 PM Room
More from the Same Authors
-
2022 : Chinchillas, Flamingos, and Gatos: Few-Shot Learning through Pre-training »
Oriol Vinyals -
2022 Poster: General-purpose, long-context autoregressive modeling with Perceiver AR »
Curtis Hawthorne · Drew Jaegle · Cătălina Cangea · Sebastian Borgeaud · Charlie Nash · Mateusz Malinowski · Sander Dieleman · Oriol Vinyals · Matthew Botvinick · Ian Simon · Hannah Sheahan · Neil Zeghidour · Jean-Baptiste Alayrac · Joao Carreira · Jesse Engel -
2022 Spotlight: General-purpose, long-context autoregressive modeling with Perceiver AR »
Curtis Hawthorne · Drew Jaegle · Cătălina Cangea · Sebastian Borgeaud · Charlie Nash · Mateusz Malinowski · Sander Dieleman · Oriol Vinyals · Matthew Botvinick · Ian Simon · Hannah Sheahan · Neil Zeghidour · Jean-Baptiste Alayrac · Joao Carreira · Jesse Engel -
2022 Poster: Improving Language Models by Retrieving from Trillions of Tokens »
Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre -
2022 Poster: Unified Scaling Laws for Routed Language Models »
Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan -
2022 Spotlight: Improving Language Models by Retrieving from Trillions of Tokens »
Sebastian Borgeaud · Arthur Mensch · Jordan Hoffmann · Trevor Cai · Eliza Rutherford · Katie Millican · George van den Driessche · Jean-Baptiste Lespiau · Bogdan Damoc · Aidan Clark · Diego de Las Casas · Aurelia Guy · Jacob Menick · Roman Ring · Tom Hennigan · Saffron Huang · Loren Maggiore · Chris Jones · Albin Cassirer · Andy Brock · Michela Paganini · Geoffrey Irving · Oriol Vinyals · Simon Osindero · Karen Simonyan · Jack Rae · Erich Elsen · Laurent Sifre -
2022 Oral: Unified Scaling Laws for Routed Language Models »
Aidan Clark · Diego de Las Casas · Aurelia Guy · Arthur Mensch · Michela Paganini · Jordan Hoffmann · Bogdan Damoc · Blake Hechtman · Trevor Cai · Sebastian Borgeaud · George van den Driessche · Eliza Rutherford · Tom Hennigan · Matthew Johnson · Albin Cassirer · Chris Jones · Elena Buchatskaya · David Budden · Laurent Sifre · Simon Osindero · Oriol Vinyals · Marc'Aurelio Ranzato · Jack Rae · Erich Elsen · Koray Kavukcuoglu · Karen Simonyan -
2021 Poster: High-Performance Large-Scale Image Recognition Without Normalization »
Andy Brock · Soham De · Samuel Smith · Karen Simonyan -
2021 Spotlight: High-Performance Large-Scale Image Recognition Without Normalization »
Andy Brock · Soham De · Samuel Smith · Karen Simonyan -
2021 Poster: Vector Quantized Models for Planning »
Sherjil Ozair · Yazhe Li · Ali Razavi · Ioannis Antonoglou · Aäron van den Oord · Oriol Vinyals -
2021 Poster: Imitation by Predicting Observations »
Drew Jaegle · Yury Sulsky · Arun Ahuja · Jake Bruce · Rob Fergus · Greg Wayne -
2021 Spotlight: Vector Quantized Models for Planning »
Sherjil Ozair · Yazhe Li · Ali Razavi · Ioannis Antonoglou · Aäron van den Oord · Oriol Vinyals -
2021 Spotlight: Imitation by Predicting Observations »
Drew Jaegle · Yury Sulsky · Arun Ahuja · Jake Bruce · Rob Fergus · Greg Wayne -
2019 Poster: Graph Matching Networks for Learning the Similarity of Graph Structured Objects »
Yujia Li · Chenjie Gu · Thomas Dullien · Oriol Vinyals · Pushmeet Kohli -
2019 Oral: Graph Matching Networks for Learning the Similarity of Graph Structured Objects »
Yujia Li · Chenjie Gu · Thomas Dullien · Oriol Vinyals · Pushmeet Kohli -
2018 Poster: Parallel WaveNet: Fast High-Fidelity Speech Synthesis »
Aäron van den Oord · Yazhe Li · Igor Babuschkin · Karen Simonyan · Oriol Vinyals · Koray Kavukcuoglu · George van den Driessche · Edward Lockhart · Luis C Cobo · Florian Stimberg · Norman Casagrande · Dominik Grewe · Seb Noury · Sander Dieleman · Erich Elsen · Nal Kalchbrenner · Heiga Zen · Alex Graves · Helen King · Tom Walters · Dan Belov · Demis Hassabis -
2018 Oral: Parallel WaveNet: Fast High-Fidelity Speech Synthesis »
Aäron van den Oord · Yazhe Li · Igor Babuschkin · Karen Simonyan · Oriol Vinyals · Koray Kavukcuoglu · George van den Driessche · Edward Lockhart · Luis C Cobo · Florian Stimberg · Norman Casagrande · Dominik Grewe · Seb Noury · Sander Dieleman · Erich Elsen · Nal Kalchbrenner · Heiga Zen · Alex Graves · Helen King · Tom Walters · Dan Belov · Demis Hassabis -
2018 Poster: Synthesizing Programs for Images using Reinforced Adversarial Learning »
Iaroslav Ganin · Tejas Kulkarni · Igor Babuschkin · S. M. Ali Eslami · Oriol Vinyals -
2018 Oral: Synthesizing Programs for Images using Reinforced Adversarial Learning »
Iaroslav Ganin · Tejas Kulkarni · Igor Babuschkin · S. M. Ali Eslami · Oriol Vinyals -
2018 Poster: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2018 Poster: Learning Implicit Generative Models with the Method of Learned Moments »
Suman Ravuri · Shakir Mohamed · Mihaela Rosca · Oriol Vinyals -
2018 Oral: Learning Implicit Generative Models with the Method of Learned Moments »
Suman Ravuri · Shakir Mohamed · Mihaela Rosca · Oriol Vinyals -
2018 Oral: Learning to search with MCTSnets »
Arthur Guez · Theophane Weber · Ioannis Antonoglou · Karen Simonyan · Oriol Vinyals · Daan Wierstra · Remi Munos · David Silver -
2017 Workshop: Video Games and Machine Learning »
Gabriel Synnaeve · Julian Togelius · Tom Schaul · Oriol Vinyals · Nicolas Usunier -
2017 Poster: Neural Message Passing for Quantum Chemistry »
Justin Gilmer · Samuel Schoenholz · Patrick F Riley · Oriol Vinyals · George Dahl -
2017 Poster: Neural Episodic Control »
Alexander Pritzel · Benigno Uria · Srinivasan Sriram · Adrià Puigdomenech Badia · Oriol Vinyals · Demis Hassabis · Daan Wierstra · Charles Blundell -
2017 Talk: Neural Message Passing for Quantum Chemistry »
Justin Gilmer · Samuel Schoenholz · Patrick F Riley · Oriol Vinyals · George Dahl -
2017 Talk: Neural Episodic Control »
Alexander Pritzel · Benigno Uria · Srinivasan Sriram · Adrià Puigdomenech Badia · Oriol Vinyals · Demis Hassabis · Daan Wierstra · Charles Blundell -
2017 Poster: Decoupled Neural Interfaces using Synthetic Gradients »
Max Jaderberg · Wojciech Czarnecki · Simon Osindero · Oriol Vinyals · Alex Graves · David Silver · Koray Kavukcuoglu -
2017 Poster: Understanding Synthetic Gradients and Decoupled Neural Interfaces »
Wojciech Czarnecki · Grzegorz Świrszcz · Max Jaderberg · Simon Osindero · Oriol Vinyals · Koray Kavukcuoglu -
2017 Poster: Video Pixel Networks »
Nal Kalchbrenner · Karen Simonyan · Aäron van den Oord · Ivo Danihelka · Oriol Vinyals · Alex Graves · Koray Kavukcuoglu -
2017 Talk: Video Pixel Networks »
Nal Kalchbrenner · Karen Simonyan · Aäron van den Oord · Ivo Danihelka · Oriol Vinyals · Alex Graves · Koray Kavukcuoglu -
2017 Talk: Understanding Synthetic Gradients and Decoupled Neural Interfaces »
Wojciech Czarnecki · Grzegorz Świrszcz · Max Jaderberg · Simon Osindero · Oriol Vinyals · Koray Kavukcuoglu -
2017 Talk: Decoupled Neural Interfaces using Synthetic Gradients »
Max Jaderberg · Wojciech Czarnecki · Simon Osindero · Oriol Vinyals · Alex Graves · David Silver · Koray Kavukcuoglu -
2017 Tutorial: Sequence-To-Sequence Modeling with Neural Networks »
Oriol Vinyals · Navdeep Jaitly