Poster
in
Workshop: Combining Theory and Benchmarks: Towards A Virtuous Cycle to Understand and Guarantee Foundation Model Performance Thu, Jul 9, 2026 • 7:00 PM – 8:00 PM PDT

Correcting Optimizer Selection Bias via Large Deviation Hazards

Andrea Zerio ⋅ Andres R Masegosa

Project Page

Abstract

Empirical risk minimisation systematically exploits finite-sample fluctuations of the training loss, producing the optimiser selection bias, responsible for miscalibration and generalisation failure in the interpolation regime. We introduce SGDR, a drop-in modification to SGD that corrects this by gating mini-batches through a two-sided rejection rule derived from the hazard transform, with the population hazards estimated via rate functions from large deviation theory. Across nine architectures spanning image and graph classification, SGDR matches or improves on baseline task performance while sharply reducing expected calibration error and overfitting, at a fraction of training time and gradient updates required by standard SGD.