Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward

How Neural Networks See, Learn and Forget

Maithra Raghu


Abstract:

Neural networks have been at the heart of machine learning breakthroughs for over a decade. But in just the past couple of years, new advances in model architectures, pretraining and scaling challenge our assumptions on how they function. In this talk I provide some insights into the workings of modern machine learning. Motivated by the ubiquity of Transformer architectures across tasks and data modalities, I discuss the recent successes of Transformers in computer vision and key similarities and differences to convolutional architectures. Next, I overview some of the salient properties of pretraining on Transformer representations and the effect of scale. I draw connections to results on catastrophic forgetting, the way in which forgetting manifests in representations and new mitigation methods suggested by these insights. I conclude with some open questions in these directions.

Chat is not available.