Skip to yearly menu bar Skip to main content

Invited Talk
Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward

Simplifying and Simplifying Self-Supervised Visual Representation Pre-Training

Xinlei Chen


In this talk, I am going to cover our recent works in the self-supervised learning space for visual representation pre-training. First is SimSiam, a non-contrastive, momentum-free framework that to our supervise, can successfully avoid trivial solutions and achieve very competitive performance to more complicated methods like MoCo. Second is Masked Autoencoder (MAE), which simply and directly reconstructs input signals by predicting natural image patches as a further simplification of self-supervised frameworks for computer vision. MAE adopts a BERT-like algorithm with crucial changes for images, and exhibits BERT-like scaling behaviors, among other intriguing properties different from contrastive learning.

Chat is not available.