Timezone: »

How Neural Networks See, Learn and Forget
Maithra Raghu

Neural networks have been at the heart of machine learning breakthroughs for over a decade. But in just the past couple of years, new advances in model architectures, pretraining and scaling challenge our assumptions on how they function. In this talk I provide some insights into the workings of modern machine learning. Motivated by the ubiquity of Transformer architectures across tasks and data modalities, I discuss the recent successes of Transformers in computer vision and key similarities and differences to convolutional architectures. Next, I overview some of the salient properties of pretraining on Transformer representations and the effect of scale. I draw connections to results on catastrophic forgetting, the way in which forgetting manifests in representations and new mitigation methods suggested by these insights. I conclude with some open questions in these directions.

Author Information

Maithra Raghu (Samaya AI)

More from the Same Authors