CAGrad-Lite : Robust Coreset-Aware Gradient Descent for Low-Data Regimes
Krishna Gupta
Abstract
We address the challenge of training deep neural networks in low-data regimes, where coreset-based subsampling amplifies gradient noise and leads to unstable optimization and poor generalization. We propose CAGrad-Lite, a coreset-aware optimizer that dynamically adapts its update dynamics based on the active data fraction $f \in (0,1]$. The method integrates three complementary mechanisms: (i) Coreset-Fraction-Aware Momentum (CFAM), which increases effective momentum under small $f$ via $ \beta_{1,\text{eff}} = 1 - (1 - \beta_1)\sqrt{f} $ to smooth high-variance updates; (ii) Gradient Variance Reduction via EMA Correction (GVREC), which stabilizes learning through second-moment estimation $ v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2 $; and (iii) Adaptive Per-Layer Gradient Clipping (AGC), which controls gradient explosion using a parameter-to-gradient norm ratio $ R = \lambda \frac{\max(|W|_2, \epsilon)}{\max(|g_t|_2, \epsilon)} $. The resulting update rule follows a decoupled form $$ W_t = W_{t-1}(1 - \eta \lambda_{wd}) - \eta \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} $$ Extensive experiments on CIFAR-10, EuroSAT, and BloodMNIST demonstrate that CAGrad-Lite consistently improves accuracy and reduces variance across extreme data fractions (e.g., 5%–10%), outperforming standard optimizers such as SGD and Adam. These results highlight the importance of coreset-aware optimization for achieving robust and stable learning under severe data constraints.
Successful Page Load