The Loss Is Not Enough: Sampling Conditions and Inductive Bias in Contrastive Representation Learning
Abstract
Contrastive learning has emerged as a powerful paradigm for self-supervised representation learning, yet the precise conditions under which it recovers meaningful latent structure remain incompletely understood. We develop a measure-theoretic framework that formalizes the diversity condition, a requirement on the sampling mechanism that is necessary for recovering the latent space up to orthogonal transformation. We prove that when this condition is violated, as commonly occurs in practical settings where augmentations preserve semantic content, the optimal encoder no longer preserves geometric structure and linear identifiability is lost. Crucially, we demonstrate that the contrastive loss alone is insufficient for latent space reconstruction: encoder inductive bias emerges as a critical component that compensates for violations of the diversity condition. Our experiments on synthetic datasets and CIFAR-10 confirm these theoretical predictions, showing that architectural constraints become essential precisely when sampling diversity is limited. These findings have direct implications for the design of data augmentation strategies and encoder architectures in self-supervised contrastive learning systems.