Timezone: »

Perceiver: General Perception with Iterative Attention
Drew Jaegle · Felix Axel Gimeno Gil · Andy Brock · Oriol Vinyals · Andrew Zisserman · Joao Carreira

Tue Jul 20 07:25 AM -- 07:30 AM (PDT) @

Biological systems understand the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models. These priors introduce helpful inductive biases, but also lock models to individual modalities. In this paper we introduce the Perceiver – a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets. The model leverages an asymmetric attention mechanism to iteratively distill inputs into a tight latent bottleneck, allowing it to scale to handle very large inputs. We show that this architecture is competitive with or outperforms strong, specialized models on classification tasks across various modalities: images, point clouds, audio, video and video+audio. The Perceiver obtains performance comparable to ResNet-50 and ViT on ImageNet without 2D convolutions by directly attending to 50,000 pixels. It is also competitive in all modalities in AudioSet.

Author Information

Drew Jaegle (DeepMind)
Felix Axel Gimeno Gil (DeepMind)

Studied two bachelor degrees (one in Mathematics, the other in Computer Science) at the Technical University of Catalonia (Spain). Now working as a Resarch Engineer as Google DeepMind, London, UK.

Andy Brock (DeepMind)
Oriol Vinyals (Google DeepMind)

Oriol Vinyals is a Research Scientist at Google. He works in deep learning with the Google Brain team. Oriol holds a Ph.D. in EECS from University of California, Berkeley, and a Masters degree from University of California, San Diego. He is a recipient of the 2011 Microsoft Research PhD Fellowship. He was an early adopter of the new deep learning wave at Berkeley, and in his thesis he focused on non-convex optimization and recurrent neural networks. At Google Brain he continues working on his areas of interest, which include artificial intelligence, with particular emphasis on machine learning, language, and vision.

Andrew Zisserman (Oxford University & Google DeepMind)
Joao Carreira (DeepMind)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors