Timezone: »

Simplifying and Simplifying Self-Supervised Visual Representation Pre-Training
Xinlei Chen

In this talk, I am going to cover our recent works in the self-supervised learning space for visual representation pre-training. First is SimSiam, a non-contrastive, momentum-free framework that to our supervise, can successfully avoid trivial solutions and achieve very competitive performance to more complicated methods like MoCo. Second is Masked Autoencoder (MAE), which simply and directly reconstructs input signals by predicting natural image patches as a further simplification of self-supervised frameworks for computer vision. MAE adopts a BERT-like algorithm with crucial changes for images, and exhibits BERT-like scaling behaviors, among other intriguing properties different from contrastive learning.

Author Information

Xinlei Chen (FAIR)

More from the Same Authors