Skip to yearly menu bar Skip to main content


Multi-scale Feature Learning Dynamics: Insights for Double Descent

Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie

Hall E #221

Keywords: [ DL: Theory ] [ T: Deep Learning ] [ DL: Everything Else ]


An intriguing phenomenon that arises from the high-dimensional learning dynamics of neural networks is the phenomenon of ``double descent''. The more commonly studied aspect of this phenomenon corresponds to \textit{model-wise} double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied \textit{epoch-wise} double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. We study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions describing the generalization error in terms of low-dimensional scalar macroscopic variables. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical simulations where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.

Chat is not available.