Timezone: »

Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty
Thomas George · Guillaume Lajoie · Aristide Baratin
Event URL: https://openreview.net/forum?id=2F8ktRFqvnM »

A recent line of work has identified a so-called ‘lazy regime’ where a deep network can be well approximated by its linearization around initialization throughout training. Here we investigate the comparative effect of the lazy (linear) and featurelearning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of spurious correlations.

Author Information

Thomas George (Mila - Université de Montréal)
Guillaume Lajoie (Mila, Université de Montréal)
Aristide Baratin (MILA)

More from the Same Authors