AMH: AdaGrad-Momentum Hybrid for Robust Training on Small Coresets
Abstract
Training deep neural networks under extreme data scarcity is challenging due to high gradient variance and unstable optimization dynamics. Standard methods such as SGD suffer from noisy updates, while adaptive optimizers like AdamW can overfit to spurious gradient statistics when data is limited. These challenges are particularly acute in low‑resource settings common in many Muslim‑majority regions, where large datasets are often unavailable. We propose AMH (AdaGrad-Momentum Hybrid), a simple yet effective optimization method designed for robust training on small coresets. AMH combines adaptive per‑parameter scaling, momentum‑based smoothing, and a leaky squared‑gradient accumulator that prevents the rapid learning‑rate decay of classical AdaGrad. This results in stable updates while preserving responsiveness to new gradient information. We evaluate AMH on EuroSAT, CIFAR‑10, and BloodMNIST using ResNet‑18 and EfficientNet‑B0 trained from scratch across coreset fractions of 100\%, 25\%, and 5\%. AMH is compared against vanilla gradient descent, SGD with momentum, mini‑batch gradient descent, and AdamW. Results show that AMH consistently improves performance under small coresets, particularly at 5\%, while remaining competitive at larger data fractions. An ablation study demonstrates that each component—adaptive scaling, momentum, and the leaky accumulator—contributes to improved stability and generalization. These findings highlight the importance of variance‑aware optimization in data‑constrained regimes, directly benefiting researchers and practitioners working with limited data in Muslim‑majority countries.