When Compression Helps Transfer: Optimal Sparse Model Selection under Distribution Shift for Low-Resource Deployment
Siddharth Karuturi ⋅ Kaustubh Bukkapatnam ⋅ Laksh Patel ⋅ Tanush A Shastry
Abstract
Deploying machine learning models in the Global South routinely requires two simultaneous adaptations treated almost always independently: (i) model compression to meet severe hardware constraints, and (ii) domain shift mitigation because training data often sourced from the Global North differs substantially from deployment data. We show these two objectives are not merely compatible but synergistic. We provide the first formal proof that there exists an optimal compression ratio $k^{*} \in (0, d)$ that minimises target-domain risk under distribution shift, and that this optimum strictly decreases with shift magnitude---more shift justifies more compression. The mechanism is clean: compression shrinks the hypothesis class, which reduces the $\mathcal{H}$-divergence between source and target distributions, partially offsetting the capacity cost. We instantiate our theory in \textsc{CompressForShift}, an algorithm that selects $k^{*}$ using only unlabeled target data, and validate it on four geographically and culturally diverse benchmarks spanning vision and language. \textsc{CompressForShift} matches an oracle with target labels to within 0.1--0.4 percentage points and uniformly outperforms both the uncompressed model and naive compression heuristics.
Successful Page Load