Timezone: »
Training neural networks under a strict Lipschitz constraint is useful for provable adversarial robustness, generalization bounds, interpretable gradients, and Wasserstein distance estimation. By the composition property of Lipschitz functions, it suffices to ensure that each individual affine transformation or nonlinear activation is 1-Lipschitz. The challenge is to do this while maintaining the expressive power. We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation. Based on this, we propose to combine a gradient norm preserving activation function, GroupSort, with norm-constrained weight matrices. We show that norm-constrained GroupSort architectures are universal Lipschitz function approximators. Empirically, we show that norm-constrained GroupSort networks achieve tighter estimates of Wasserstein distance than their ReLU counterparts and can achieve provable adversarial robustness guarantees with little cost to accuracy.
Author Information
Cem Anil (University of Toronto)
James Lucas (University of Toronto)
Roger Grosse (University of Toronto and Vector Institute)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Sorting Out Lipschitz Function Approximation »
Wed. Jun 12th 09:35 -- 09:40 PM Room Hall A
More from the Same Authors
-
2022 Poster: On Implicit Bias in Overparameterized Bilevel Optimization »
Paul Vicol · Jonathan Lorraine · Fabian Pedregosa · David Duvenaud · Roger Grosse -
2022 Spotlight: On Implicit Bias in Overparameterized Bilevel Optimization »
Paul Vicol · Jonathan Lorraine · Fabian Pedregosa · David Duvenaud · Roger Grosse -
2021 Poster: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Spotlight: Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition »
Shengyang Sun · Jiaxin Shi · Andrew Wilson · Roger Grosse -
2021 Poster: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Spotlight: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Poster: On Monotonic Linear Interpolation of Neural Network Parameters »
James Lucas · Juhan Bae · Michael Zhang · Stanislav Fort · Richard Zemel · Roger Grosse -
2021 Spotlight: On Monotonic Linear Interpolation of Neural Network Parameters »
James Lucas · Juhan Bae · Michael Zhang · Stanislav Fort · Richard Zemel · Roger Grosse -
2020 Poster: Evaluating Lossy Compression Rates of Deep Generative Models »
Sicong Huang · Alireza Makhzani · Yanshuai Cao · Roger Grosse -
2019 Poster: EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis »
Chaoqi Wang · Roger Grosse · Sanja Fidler · Guodong Zhang -
2019 Oral: EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis »
Chaoqi Wang · Roger Grosse · Sanja Fidler · Guodong Zhang -
2018 Poster: Noisy Natural Gradient as Variational Inference »
Guodong Zhang · Shengyang Sun · David Duvenaud · Roger Grosse -
2018 Poster: Distilling the Posterior in Bayesian Neural Networks »
Kuan-Chieh Wang · Paul Vicol · James Lucas · Li Gu · Roger Grosse · Richard Zemel -
2018 Oral: Noisy Natural Gradient as Variational Inference »
Guodong Zhang · Shengyang Sun · David Duvenaud · Roger Grosse -
2018 Oral: Distilling the Posterior in Bayesian Neural Networks »
Kuan-Chieh Wang · Paul Vicol · James Lucas · Li Gu · Roger Grosse · Richard Zemel -
2018 Poster: Differentiable Compositional Kernel Learning for Gaussian Processes »
Shengyang Sun · Guodong Zhang · Chaoqi Wang · Wenyuan Zeng · Jiaman Li · Roger Grosse -
2018 Oral: Differentiable Compositional Kernel Learning for Gaussian Processes »
Shengyang Sun · Guodong Zhang · Chaoqi Wang · Wenyuan Zeng · Jiaman Li · Roger Grosse