Timezone: »
Lipschitz constants of neural networks have been explored in various contexts in deep learning, such as provable adversarial robustness, estimating Wasserstein distance, stabilising training of GANs, and formulating invertible neural networks. Such works have focused on bounding the Lipschitz constant of fully connected or convolutional networks, composed of linear maps and pointwise non-linearities. In this paper, we investigate the Lipschitz constant of self-attention, a non-linear neural network module widely used in sequence modelling. We prove that the standard dot-product self-attention is not Lipschitz for unbounded input domain, and propose an alternative L2 self-attention that is Lipschitz. We derive an upper bound on the Lipschitz constant of L2 self-attention and provide empirical evidence for its asymptotic tightness. To demonstrate the practical relevance of our theoretical work, we formulate invertible self-attention and use it in a Transformer-based architecture for a character-level language modelling task.
Author Information
Hyunjik Kim (DeepMind)
George Papamakarios (DeepMind)
Andriy Mnih (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: The Lipschitz Constant of Self-Attention »
Wed. Jul 21st 04:00 -- 06:00 PM Room
More from the Same Authors
-
2023 Poster: Learning Instance-Specific Augmentations by Capturing Local Invariances »
Ning Miao · Tom Rainforth · Emile Mathieu · Yann Dubois · Yee-Whye Teh · Adam Foster · Hyunjik Kim -
2023 Poster: Compositional Score Modeling for Simulation-Based Inference »
Tomas Geffner · George Papamakarios · Andriy Mnih -
2022 Poster: From data to functa: Your data point is a function and you can treat it like one »
Emilien Dupont · Hyunjik Kim · S. M. Ali Eslami · Danilo J. Rezende · Dan Rosenbaum -
2022 Spotlight: From data to functa: Your data point is a function and you can treat it like one »
Emilien Dupont · Hyunjik Kim · S. M. Ali Eslami · Danilo J. Rezende · Dan Rosenbaum -
2021 Workshop: INNF+: Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models »
Chin-Wei Huang · David Krueger · Rianne Van den Berg · George Papamakarios · Ricky T. Q. Chen · Danilo J. Rezende -
2021 Poster: Generalized Doubly Reparameterized Gradient Estimators »
Matthias Bauer · Andriy Mnih -
2021 Spotlight: Generalized Doubly Reparameterized Gradient Estimators »
Matthias Bauer · Andriy Mnih -
2021 Poster: LieTransformer: Equivariant Self-Attention for Lie Groups »
Michael Hutchinson · Charline Le Lan · Sheheryar Zaidi · Emilien Dupont · Yee-Whye Teh · Hyunjik Kim -
2021 Spotlight: LieTransformer: Equivariant Self-Attention for Lie Groups »
Michael Hutchinson · Charline Le Lan · Sheheryar Zaidi · Emilien Dupont · Yee-Whye Teh · Hyunjik Kim -
2020 Workshop: INNF+: Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models »
Chin-Wei Huang · David Krueger · Rianne Van den Berg · George Papamakarios · Chris Cremer · Ricky T. Q. Chen · Danilo J. Rezende -
2020 Poster: MetaFun: Meta-Learning with Iterative Functional Updates »
Jin Xu · Jean-Francois Ton · Hyunjik Kim · Adam Kosiorek · Yee-Whye Teh -
2020 Poster: On Contrastive Learning for Likelihood-free Inference »
Conor Durkan · Iain Murray · George Papamakarios -
2020 Poster: Normalizing Flows on Tori and Spheres »
Danilo J. Rezende · George Papamakarios · Sebastien Racaniere · Michael Albergo · Gurtej Kanwar · Phiala Shanahan · Kyle Cranmer -
2019 Workshop: Invertible Neural Networks and Normalizing Flows »
Chin-Wei Huang · David Krueger · Rianne Van den Berg · George Papamakarios · Aidan Gomez · Chris Cremer · Aaron Courville · Ricky T. Q. Chen · Danilo J. Rezende -
2018 Poster: Disentangling by Factorising »
Hyunjik Kim · Andriy Mnih -
2018 Oral: Disentangling by Factorising »
Hyunjik Kim · Andriy Mnih