Is Spurious Correlation Removal Always Learnable?
Yibo Zhou ⋅ Bo Li ⋅ Hai-Miao Hu ⋅ Hanzi Wang ⋅ Xiaokang Zhang ⋅ Ruifan Zhang
Abstract
Invariant learning can fail even when the invariant structure is statistically identifiable. We show an inherent computational barrier: under the Planted Clique hypothesis, there exist samplable linear-Gaussian multi-environment instances with a one-dimensional invariant subspace ($k=1$) that are learnable with polynomial samples by exhaustive search but intractable for any polynomial-time algorithm, via an average-case reduction from a supervised sparse primitive. We further quantify environment diversity by a separation parameter $\gamma$, which controls identifiability and the curvature of invariance objectives. Under sufficient diversity, the minimax risk is $\mathbb{E}[dist(\hat{V},V_{\mathrm{inv}})^2]=\Theta(k(d-k)/(n|\mathcal{E}|))$, and under label-induced shifts a phase transition occurs at $n^*\propto k(d-k)/(|\mathcal{E}|\gamma^2)$. Synthetic and real datasets validate the predicted gaps and transitions and motivate simple diversity diagnostics.
Successful Page Load