Workshop: Over-parameterization: Pitfalls and Opportunities

On the Origins of the Block Structure Phenomenon in Neural Network Representations

Thao Nguyen · Maithra Raghu · Simon Kornblith


Increasing capacity of neural network architectures by varying their width and depth has been central to their successes. However, recent work has shown that in overparameterized models, the hidden representations exhibit a block structure --- a large set of contiguous layers with highly similar representations. In this paper, we investigate how this block structure arises, its connection to the data, and the relationship between training mechanisms and the block structure. We begin by showing that the block structure representations are robust to small out-of-distribution shifts in the data. Leveraging insights connecting the block structure and the first principal component of the representations, we then demonstrate that the block structure arises from a small group of examples with similar image statistics. These examples have very large activation norms, and dominate the representational geometry of intermediate network layers. While these "dominant" datapoints are similar across all layers inside the block structure of a single network, different training runs lead to different sets of dominant datapoints. With these insights, we take an interventional approach, introducing a method to regularize the block structure, and also exploring how popular training mechanisms that help with performance can eliminate the block structure in the internal representations of overparameterized models.