Timezone: »

Implicit Bias of the Step Size in Linear Diagonal Neural Networks
Mor Shpigel Nacson · Kavya Ravichandran · Nati Srebro · Daniel Soudry

Wed Jul 20 08:45 AM -- 08:50 AM (PDT) @ Ballroom 1 & 2
Focusing on diagonal linear networks as a model for understanding the implicit bias in underdetermined models, we show how the gradient descent step size can have a large qualitative effect on the implicit bias, and thus on generalization ability. In particular, we show how using large step size for non-centered data can change the implicit bias from a "kernel" type behavior to a "rich" (sparsity-inducing) regime --- even when gradient flow, studied in previous works, would not escape the "kernel" regime. We do so by using dynamic stability, proving that convergence to dynamically stable global minima entails a bound on some weighted $\ell_1$-norm of the linear predictor, i.e. a "rich" regime. We prove this leads to good generalization in a sparse regression setting.

Author Information

Mor Shpigel Nacson (Technion)
Kavya Ravichandran (Toyota Technological Institute at Chicago)
Nati Srebro (Toyota Technological Institute at Chicago)
Daniel Soudry (Technion)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors