Timezone: »

Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney

Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, they mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the \emph{lower tail exponent} of the transition kernel associated with the optimizer around a local minimum, in \emph{both} discrete- and continuous-time settings. To achieve this, we first prove a data- and algorithm-dependent generalization bound in terms of the celebrated Fernique--Talagrand functional applied to the trajectory of the optimizer. Then, we specialize this result by exploiting the Markovian structure of stochastic optimizers, and derive bounds in terms of their (data-dependent) transition kernels. We support our theory with empirical results from a variety of neural networks, showing correlations between generalization error and lower tail exponents.

Author Information

Liam Hodgkinson (University of California Berkeley)
Umut Simsekli (Inria/ENS)
Rajiv Khanna (UC Berkeley)
Michael Mahoney (UC Berkeley)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors