Tutorial
Sampling as First-Order Optimization over a space of probability measures
Anna Korba · Adil Salim
Moderator : Amin Karbasi
Ballroom 1 & 2
Sampling from a target probability distribution whose density is only known up to a normalisation constant is a fundamental problem in statistics and machine learning. While the literature on optimization for machine learning has developed widely in the past decade, with fine convergence rates for some methods, the literature on sampling remained mainly asymptotic until very recently. Since then, the Machine Learning community has been increasingly interested in the non asymptotic analysis of sampling algorithms, or in designing new schemes to improve the complexity of sampling. Interestingly, approximating a target probability distribution can be cast as an optimization problem where the objective functional measures the dissimilarity to the target distribution. In particular, the Kullback-Leibler divergence (or relative entropy) with respect to the target distribution is a suitable objective functional when the normalisation constant is intractable, as it is commonly the case in Bayesian inference. This optimization problem can be addressed using optimization techniques over a space of probability measures. The theory of Wasserstein gradient flows provides tools to solve this optimization problem. Indeed, Wasserstein gradient flows are continuous paths of distributions that decrease the objective functional. Moreover, several sampling algorithms such as Langevin Monte Carlo or Stein Variational Gradient Descent can be seen as discretisations of a Wasserstein gradient flow. In this tutorial, we will show how one can leverage optimization techniques to design and analyze sampling algorithms. We will first review fundamental optimization concepts such as Euclidean gradient flows (i.e., continuous time gradient descent), before introducing their optimal transport analogue such as Wasserstein gradient flows. Then, we will present an optimization point of view on both standard and novel sampling algorithms, and how this point of view has led to new convergence results and the design of new schemes.