Timezone: »
In min-min optimization or max-min optimization, one has to compute the gradient of a function defined as a minimum. In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm. There are two usual ways of estimating the gradient of the function: using either an analytic formula obtained by assuming exactness of the approximation, or automatic differentiation through the algorithm. In this paper, we study the asymptotic error made by these estimators as a function of the optimization error. We find that the error of the automatic estimator is close to the square of the error of the analytic estimator, reflecting a super-efficiency phenomenon. The convergence of the automatic estimator greatly depends on the convergence of the Jacobian of the algorithm. We analyze it for gradient descent and stochastic gradient descent and derive convergence rates for the estimators in these cases. Our analysis is backed by numerical experiments on toy problems and on Wasserstein barycenter computation. Finally, we discuss the computational complexity of these estimators and give practical guidelines to chose between them.
Author Information
Pierre Ablin (CNRS and ENS)
Gabriel Peyré (CNRS and ENS)
Thomas Moreau (Inria)
More from the Same Authors
-
2023 Poster: FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels »
Guillaume Staerman · Cédric Allain · Alexandre Gramfort · Thomas Moreau -
2023 Poster: Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective »
Michael Sander · Joan Puigcerver · Josip Djolonga · Gabriel Peyré · Mathieu Blondel -
2023 Poster: Sliced-Wasserstein on Symmetric Positive Definite Matrices for M/EEG Signals »
Clément Bonet · Benoît Malézieux · alain rakotomamonjy · Lucas Drumetz · Thomas Moreau · Matthieu Kowalski · Nicolas Courty -
2023 Poster: Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps »
Marco Cuturi · Michal Klein · Pierre Ablin -
2022 Poster: Unsupervised Ground Metric Learning Using Wasserstein Singular Vectors »
Geert-Jan Huizing · Laura Cantini · Gabriel Peyré -
2022 Spotlight: Unsupervised Ground Metric Learning Using Wasserstein Singular Vectors »
Geert-Jan Huizing · Laura Cantini · Gabriel Peyré -
2022 Poster: Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi -
2022 Spotlight: Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs »
Meyer Scetbon · Gabriel Peyré · Marco Cuturi -
2021 Poster: Kernel Stein Discrepancy Descent »
Anna Korba · Pierre-Cyril Aubin-Frankowski · Szymon Majewski · Pierre Ablin -
2021 Oral: Kernel Stein Discrepancy Descent »
Anna Korba · Pierre-Cyril Aubin-Frankowski · Szymon Majewski · Pierre Ablin -
2021 Poster: Low-Rank Sinkhorn Factorization »
Meyer Scetbon · Marco Cuturi · Gabriel Peyré -
2021 Poster: Momentum Residual Neural Networks »
Michael Sander · Pierre Ablin · Mathieu Blondel · Gabriel Peyré -
2021 Spotlight: Momentum Residual Neural Networks »
Michael Sander · Pierre Ablin · Mathieu Blondel · Gabriel Peyré -
2021 Spotlight: Low-Rank Sinkhorn Factorization »
Meyer Scetbon · Marco Cuturi · Gabriel Peyré -
2019 Poster: Geometric Losses for Distributional Learning »
Arthur Mensch · Mathieu Blondel · Gabriel Peyré -
2019 Oral: Geometric Losses for Distributional Learning »
Arthur Mensch · Mathieu Blondel · Gabriel Peyré -
2019 Poster: Stochastic Deep Networks »
Gwendoline De Bie · Gabriel Peyré · Marco Cuturi -
2019 Oral: Stochastic Deep Networks »
Gwendoline De Bie · Gabriel Peyré · Marco Cuturi