Correcting Optimizer Selection Bias via Large Deviation Hazards
Andrea Zerio ⋅ Andres R Masegosa
Abstract
Empirical risk minimisation systematically exploits finite-sample fluctuations of the training loss, producing the optimiser selection bias, responsible for miscalibration and generalisation failure in the interpolation regime. We introduce SGDR, a drop-in modification to SGD that corrects this by gating mini-batches through a two-sided rejection rule derived from the hazard transform, with the population hazards estimated via rate functions from large deviation theory. Across nine architectures spanning image and graph classification, SGDR matches or improves on baseline task performance while sharply reducing expected calibration error and overfitting, at a fraction of training time and gradient updates required by standard SGD.
Successful Page Load