Timezone: »
Neural networks have been at the heart of machine learning breakthroughs for over a decade. But in just the past couple of years, new advances in model architectures, pretraining and scaling challenge our assumptions on how they function. In this talk I provide some insights into the workings of modern machine learning. Motivated by the ubiquity of Transformer architectures across tasks and data modalities, I discuss the recent successes of Transformers in computer vision and key similarities and differences to convolutional architectures. Next, I overview some of the salient properties of pretraining on Transformer representations and the effect of scale. I draw connections to results on catastrophic forgetting, the way in which forgetting manifests in representations and new mitigation methods suggested by these insights. I conclude with some open questions in these directions.
Author Information
Maithra Raghu (Samaya AI)
More from the Same Authors
-
2022 Workshop: Knowledge Retrieval and Language Models »
Maithra Raghu · Urvashi Khandelwal · Chiyuan Zhang · Matei Zaharia · Alexander Rush -
2019 Workshop: Identifying and Understanding Deep Learning Phenomena »
Hanie Sedghi · Samy Bengio · Kenji Hata · Aleksander Madry · Ari Morcos · Behnam Neyshabur · Maithra Raghu · Ali Rahimi · Ludwig Schmidt · Ying Xiao -
2019 Poster: Direct Uncertainty Prediction for Medical Second Opinions »
Maithra Raghu · Katy Blumer · Rory sayres · Ziad Obermeyer · Bobby Kleinberg · Sendhil Mullainathan · Jon Kleinberg -
2019 Oral: Direct Uncertainty Prediction for Medical Second Opinions »
Maithra Raghu · Katy Blumer · Rory sayres · Ziad Obermeyer · Bobby Kleinberg · Sendhil Mullainathan · Jon Kleinberg -
2018 Poster: Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? »
Maithra Raghu · Alexander Irpan · Jacob Andreas · Bobby Kleinberg · Quoc Le · Jon Kleinberg -
2018 Oral: Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? »
Maithra Raghu · Alexander Irpan · Jacob Andreas · Bobby Kleinberg · Quoc Le · Jon Kleinberg -
2017 Poster: On the Expressive Power of Deep Neural Networks »
Maithra Raghu · Ben Poole · Surya Ganguli · Jon Kleinberg · Jascha Sohl-Dickstein -
2017 Talk: On the Expressive Power of Deep Neural Networks »
Maithra Raghu · Ben Poole · Surya Ganguli · Jon Kleinberg · Jascha Sohl-Dickstein