Skip to yearly menu bar Skip to main content

Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward

Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

Nenad Tomasev · Ioana Bica · Brian McWilliams · Lars Buesing · Razvan Pascanu · Charles Blundell · Jovana Mitrovic


Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark. To address this, we propose a novel self-supervised representation learning method Representation Learning via Invariant Causal Mechanisms v2 (ReLICv2) (based on ReLIC (Mitrovic et al., 2021)) which explicitly enforces invariance over spurious features such as background and object style. We conduct an extensive experimental evaluation across a varied set of datasets, learning settings and tasks. ReLICv2 achieves 77.1% top-1 accuracy on ImageNet using linear evaluation with a ResNet50 architecture and 80.6% with larger ResNet models, outperforming previous state-of-the-art self-supervised approaches by a wide margin. Moreover, we show a relative overall improvement exceeding +5% over the supervised baseline in the transfer setting and the ability to learn more robust representations than self-supervised and supervised models. Most notably, ReLICv2 is the first unsupervised representation learning method to consistently outperform a standard supervised baseline in a like-for-like comparison across a wide range of ResNet architectures. Finally, we show that despite using ResNet encoders, ReLICv2 is comparable to state-of-the-art self-supervised vision transformers.

Chat is not available.