Skip to yearly menu bar Skip to main content


Poster

Learning by Reconstruction Produces Uninformative Features For Perception

Randall Balestriero · Yann LeCun


Abstract:

Input space reconstruction appears as an attractive representation learning paradigm, e.g., using Principal Component Analysis or Denoising/Masked Auto-Encoders (MAEs). Despite interpretable reconstruction and generative abilities, we uncover three pitfalls to this strategy when it comes to producing Deep Network (DN) representations to be used for perception. {\bf Wasteful:}~reconstruction forces a model to allocate its capacity and training resources towards a subspace of the data explaining the observed variance--a subspace with uninformative features for perception. For example, learning a resnet classifier on TinyImagenet projected onto the top subspace explaining 90\% of the variance reaches 45\% test accuracy while projecting onto the bottom subspace accounting for only 20\% of the image variance produces 55\% test accuracy. {\bf Ill-conditioned:} capturing features useful for perception occur at the latest stage of training since the principal subspace (uninformative for perception) is learned first. {\bf Ill-posed:} for given train and test set reconstruction loss values, one can find two set of parameters that offer drastically different classification performance of the encoder's embedding, e.g., going from 60\% to 86\% on Imagenet-10 test set.

Live content is unavailable. Log in and register to view live content