Skip to yearly menu bar Skip to main content


Better generalization with less data using robust gradient descent

Matthew J. Holland · Kazushi Ikeda

Pacific Ballroom #192

Keywords: [ Supervised Learning ] [ Statistical Learning Theory ]


For learning tasks where the data (or losses) may be heavy-tailed, algorithms based on empirical risk minimization may require a substantial number of observations in order to perform well off-sample. In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure. Finite-sample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and real-world data illustrate that more efficient and reliable learning is possible without prior knowledge of the loss tails.

Live content is unavailable. Log in and register to view live content