Timezone: »
Poster
Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Michael Sander · Joan Puigcerver · Josip Djolonga · Gabriel Peyré · Mathieu Blondel
The top-$k$ operator returns a $k$-sparse vector, where the non-zero values correspond to the $k$ largest values of the input. Unfortunately, because it is a discontinuous function, it is difficult to incorporate in neural networks trained end-to-end with backpropagation. Recent works have considered differentiable relaxations, based either on regularization or perturbation techniques. However, to date, no approach is fully differentiable and sparse. In this paper, we propose new differentiable and sparse top-$k$ operators. We view the top-$k$ operator as a linear program over the permutahedron, the convex hull of permutations. We then introduce a $p$-norm regularization term to smooth out the operator, and show that its computation can be reduced to isotonic optimization. Our framework is significantly more general than the existing one and allows for example to express top-$k$ operators that select values in magnitude. On the algorithmic side, in addition to pool adjacent violator (PAV) algorithms, we propose a new GPU/TPU-friendly Dykstra algorithm to solve isotonic optimization problems. We successfully use our operators to prune weights in neural networks, to fine-tune vision transformers, and as a router in sparse mixture of experts.
Author Information
Michael Sander (Ecole Normale Supérieure de Paris)
Joan Puigcerver (Google DeepMind)
Josip Djolonga (Google)
Gabriel Peyré (CNRS and ENS)
Mathieu Blondel (Google)
More from the Same Authors
-
2022 : SI-Score »
Jessica Yung · Rob Romijnders · Alexander Kolesnikov · Lucas Beyer · Josip Djolonga · Neil Houlsby · Sylvain Gelly · Mario Lucic · Xiaohua Zhai -
2023 Poster: Scaling Vision Transformers to 22 Billion Parameters »
Mostafa Dehghani · Josip Djolonga · Basil Mustafa · Piotr Padlewski · Jonathan Heek · Justin Gilmer · Andreas Steiner · Mathilde Caron · Robert Geirhos · Ibrahim Alabdulmohsin · Rodolphe Jenatton · Lucas Beyer · Michael Tschannen · Anurag Arnab · Xiao Wang · Carlos Riquelme · Matthias Minderer · Joan Puigcerver · Utku Evci · Manoj Kumar · Sjoerd van Steenkiste · Gamaleldin Elsayed · Aravindh Mahendran · Fisher Yu · Avital Oliver · Fantine Huot · Jasmijn Bastings · Mark Collier · Alexey Gritsenko · Vighnesh N Birodkar · Cristina Vasconcelos · Yi Tay · Thomas Mensink · Alexander Kolesnikov · Filip Pavetic · Dustin Tran · Thomas Kipf · Mario Lucic · Xiaohua Zhai · Daniel Keysers · Jeremiah Harmsen · Neil Houlsby -
2023 Oral: Scaling Vision Transformers to 22 Billion Parameters »
Mostafa Dehghani · Josip Djolonga · Basil Mustafa · Piotr Padlewski · Jonathan Heek · Justin Gilmer · Andreas Steiner · Mathilde Caron · Robert Geirhos · Ibrahim Alabdulmohsin · Rodolphe Jenatton · Lucas Beyer · Michael Tschannen · Anurag Arnab · Xiao Wang · Carlos Riquelme · Matthias Minderer · Joan Puigcerver · Utku Evci · Manoj Kumar · Sjoerd van Steenkiste · Gamaleldin Elsayed · Aravindh Mahendran · Fisher Yu · Avital Oliver · Fantine Huot · Jasmijn Bastings · Mark Collier · Alexey Gritsenko · Vighnesh N Birodkar · Cristina Vasconcelos · Yi Tay · Thomas Mensink · Alexander Kolesnikov · Filip Pavetic · Dustin Tran · Thomas Kipf · Mario Lucic · Xiaohua Zhai · Daniel Keysers · Jeremiah Harmsen · Neil Houlsby -
2022 : SI-Score »
Jessica Yung · Rob Romijnders · Alexander Kolesnikov · Lucas Beyer · Josip Djolonga · Neil Houlsby · Sylvain Gelly · Mario Lucic · Xiaohua Zhai -
2022 Poster: Unsupervised Ground Metric Learning Using Wasserstein Singular Vectors »
Geert-Jan Huizing · Laura Cantini · Gabriel Peyré -
2022 Spotlight: Unsupervised Ground Metric Learning Using Wasserstein Singular Vectors »
Geert-Jan Huizing · Laura Cantini · Gabriel Peyré -
2022 Poster: Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi -
2022 Spotlight: Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi -
2021 Poster: Low-Rank Sinkhorn Factorization »
Meyer Scetbon · Marco Cuturi · Gabriel Peyré -
2021 Poster: Momentum Residual Neural Networks »
Michael Sander · Pierre Ablin · Mathieu Blondel · Gabriel Peyré -
2021 Spotlight: Momentum Residual Neural Networks »
Michael Sander · Pierre Ablin · Mathieu Blondel · Gabriel Peyré -
2021 Spotlight: Low-Rank Sinkhorn Factorization »
Meyer Scetbon · Marco Cuturi · Gabriel Peyré -
2020 Poster: Super-efficiency of automatic differentiation for functions defined as a minimum »
Pierre Ablin · Gabriel Peyré · Thomas Moreau -
2020 Poster: Fast Differentiable Sorting and Ranking »
Mathieu Blondel · Olivier Teboul · Quentin Berthet · Josip Djolonga -
2020 Poster: Implicit differentiation of Lasso-type models for hyperparameter optimization »
Quentin Bertrand · Quentin Klopfenstein · Mathieu Blondel · Samuel Vaiter · Alexandre Gramfort · Joseph Salmon -
2019 Poster: Geometric Losses for Distributional Learning »
Arthur Mensch · Mathieu Blondel · Gabriel Peyré -
2019 Oral: Geometric Losses for Distributional Learning »
Arthur Mensch · Mathieu Blondel · Gabriel Peyré -
2019 Poster: Stochastic Deep Networks »
Gwendoline De Bie · Gabriel Peyré · Marco Cuturi -
2019 Oral: Stochastic Deep Networks »
Gwendoline De Bie · Gabriel Peyré · Marco Cuturi -
2018 Poster: Differentiable Dynamic Programming for Structured Prediction and Attention »
Arthur Mensch · Mathieu Blondel -
2018 Oral: Differentiable Dynamic Programming for Structured Prediction and Attention »
Arthur Mensch · Mathieu Blondel -
2018 Poster: SparseMAP: Differentiable Sparse Structured Inference »
Vlad Niculae · Andre Filipe Torres Martins · Mathieu Blondel · Claire Cardie -
2018 Oral: SparseMAP: Differentiable Sparse Structured Inference »
Vlad Niculae · Andre Filipe Torres Martins · Mathieu Blondel · Claire Cardie -
2017 Poster: Soft-DTW: a Differentiable Loss Function for Time-Series »
Marco Cuturi · Mathieu Blondel -
2017 Talk: Soft-DTW: a Differentiable Loss Function for Time-Series »
Marco Cuturi · Mathieu Blondel