All ERMs Can Fail in Stochastic Convex Optimization Lower Bounds in Linear Dimension
Tal Burla ⋅ Roi Livni
Abstract
We study the sample complexity of *best-case* Empirical Risk Minimizer in the setting of Stochastic Convex Optimization. We show that there exists an instance, where sample size is linear in dimension, learning is possible, but an Empirical Risk Minimizer is likely to be *unique* and *overfits*. This resolves an open question by Feldman. We also extend this to approximate ERMs. Building on our construction we also show that (constrained) Gradient Descent potentially overfits when horizon and learning rate grow w.r.t sample size. Specifically we provide a novel generalization lower bound of $\Omega\left(\eta T/(m\sqrt{m})\right)$ for Gradient Descent, where $\eta$ is the learning rate, $T$ is the horizon and $m$ is the sample size. This narrows down, exponentially, the gap between the best known upper bound of $O(\eta T/m)$ and existing lower bounds from previous constructions.
Successful Page Load