Poster
in
Workshop: High-dimensional Learning Dynamics Workshop: The Emergence of Structure and Reasoning
Where Do Large Learning Rates Lead Us? A Feature Learning Perspective
Ildus Sadrtdinov · Maxim Kodryan · Eduard Pokonechny · Ekaterina Lobacheva · Dmitry Vetrov
It is a conventional wisdom that using large learning rates (LRs) early in training improves generalization. Following a line of research devoted to understanding this effect mechanistically, we conduct an empirical study in a controlled setting focusing on the feature learning properties of training with different initial LRs. We show that the range of initial LRs providing the best generalization of the final solution results in a sparse set of learned features, with a clear focus on those most relevant for the task. In contrast, training starting with too small LRs attempts to learn all features simultaneously, resulting in poor generalization. Conversely, using initial LRs that are too large fails to extract meaningful patterns from the data.