Timezone: »

Multi-scale Feature Learning Dynamics: Insights for Double Descent
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie

Thu Jul 21 08:30 AM -- 08:35 AM (PDT) @ Ballroom 1 & 2

An intriguing phenomenon that arises from the high-dimensional learning dynamics of neural networks is the phenomenon of ``double descent''. The more commonly studied aspect of this phenomenon corresponds to \textit{model-wise} double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied \textit{epoch-wise} double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. We study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions describing the generalization error in terms of low-dimensional scalar macroscopic variables. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical simulations where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.

Author Information

Mohammad Pezeshki (Mila, Université de Montréal)
Amartya Mitra (Capgemini America Inc.)

I lead the Methods & Engg. group at Capgemini America, where my work focuses on incorporating SOTA ML methodologies into financial services. Prior to this, I received my doctorate in theoretical physics at UC Riverside, where I worked on game optimization and generalization dynamics.

Yoshua Bengio (Mila - Quebec AI Institute)
Guillaume Lajoie (Mila, Université de Montréal)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors