Timezone: »

Poster
Multi-scale Feature Learning Dynamics: Insights for Double Descent
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie

Thu Jul 21 03:00 PM -- 05:00 PM (PDT) @ Hall E #221

An intriguing phenomenon that arises from the high-dimensional learning dynamics of neural networks is the phenomenon of double descent''. The more commonly studied aspect of this phenomenon corresponds to \textit{model-wise} double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied \textit{epoch-wise} double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. We study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions describing the generalization error in terms of low-dimensional scalar macroscopic variables. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical simulations where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.

#### Author Information

##### Amartya Mitra (Capgemini America Inc.)

I lead the Methods & Engg. group at Capgemini America, where my work focuses on incorporating SOTA ML methodologies into financial services. Prior to this, I received my doctorate in theoretical physics at UC Riverside, where I worked on game optimization and generalization dynamics.