VarNorm-GD: Gradient Variance Normalisation for Scarce-Data Image Classification
MUNUSAMY M ⋅ Sai Adith Prakash ⋅ Aryan Rajput
Abstract
Training deep neural networks under data scarcity is a fundamental challenge: mini-batch gradient estimates become noisy and unreliable when coreset sizes shrink to 5--25% of a full dataset. We propose VarNorm-GD, a gradient-descent variant that estimates per-parameter centred gradient variance across a sliding FIFO buffer of recent mini-batches and rescales each update by $1/\sqrt{\hat{\sigma}^2 + \epsilon}$, directly suppressing the gradient noise amplified by small coresets. VarNorm-GD further incorporates an exponential moving average of gradient variance ("variance memory") with decay $\beta_2 = 0.99$ to stabilise estimates in the low-data regime. We evaluate on three image-classification benchmarks---EuroSAT (satellite imagery), BloodMNIST (medical imaging), and CIFAR-10 (natural images)---using ResNet-18 and EfficientNet-B0 trained from scratch at coreset fractions of 100%, 25%, and 5%. At the 5% coreset, VarNorm-GD achieves 65.72% on EuroSAT/ResNet-18 and 71.59% on BloodMNIST/ResNet-18, outperforming all baselines including Adam, AdamW, and SGD with momentum. Ablation studies confirm that the variance-memory component is decisive for low-data stability.
Successful Page Load