Timezone: »
Increasing capacity of neural network architectures by varying their width and depth has been central to their successes. However, recent work has shown that in overparameterized models, the hidden representations exhibit a block structure --- a large set of contiguous layers with highly similar representations. In this paper, we investigate how this block structure arises, its connection to the data, and the relationship between training mechanisms and the block structure. We begin by showing that the block structure representations are robust to small out-of-distribution shifts in the data. Leveraging insights connecting the block structure and the first principal component of the representations, we then demonstrate that the block structure arises from a small group of examples with similar image statistics. These examples have very large activation norms, and dominate the representational geometry of intermediate network layers. While these "dominant" datapoints are similar across all layers inside the block structure of a single network, different training runs lead to different sets of dominant datapoints. With these insights, we take an interventional approach, introducing a method to regularize the block structure, and also exploring how popular training mechanisms that help with performance can eliminate the block structure in the internal representations of overparameterized models.
Author Information
Thao Nguyen (Google)
Maithra Raghu (Google)
Simon Kornblith (Google Brain)
More from the Same Authors
-
2023 Poster: On the Relationship Between Explanation and Prediction: A Causal View »
Amir-Hossein Karimi · Krikamol Muandet · Simon Kornblith · Bernhard Schölkopf · Been Kim -
2022 Poster: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time »
Mitchell Wortsman · Gabriel Ilharco · Samir Gadre · Rebecca Roelofs · Raphael Gontijo Lopes · Ari Morcos · Hongseok Namkoong · Ali Farhadi · Yair Carmon · Simon Kornblith · Ludwig Schmidt -
2022 Spotlight: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time »
Mitchell Wortsman · Gabriel Ilharco · Samir Gadre · Rebecca Roelofs · Raphael Gontijo Lopes · Ari Morcos · Hongseok Namkoong · Ali Farhadi · Yair Carmon · Simon Kornblith · Ludwig Schmidt -
2021 Poster: Generalised Lipschitz Regularisation Equals Distributional Robustness »
Zac Cranko · Zhan Shi · Xinhua Zhang · Richard Nock · Simon Kornblith -
2021 Spotlight: Generalised Lipschitz Regularisation Equals Distributional Robustness »
Zac Cranko · Zhan Shi · Xinhua Zhang · Richard Nock · Simon Kornblith -
2020 Poster: Concept Bottleneck Models »
Pang Wei Koh · Thao Nguyen · Yew Siang Tang · Stephen Mussmann · Emma Pierson · Been Kim · Percy Liang -
2020 Poster: Revisiting Spatial Invariance with Low-Rank Local Connectivity »
Gamaleldin Elsayed · Prajit Ramachandran · Jon Shlens · Simon Kornblith -
2020 Poster: A Simple Framework for Contrastive Learning of Visual Representations »
Ting Chen · Simon Kornblith · Mohammad Norouzi · Geoffrey Hinton -
2019 Poster: Similarity of Neural Network Representations Revisited »
Simon Kornblith · Mohammad Norouzi · Honglak Lee · Geoffrey Hinton -
2019 Oral: Similarity of Neural Network Representations Revisited »
Simon Kornblith · Mohammad Norouzi · Honglak Lee · Geoffrey Hinton