For learning tasks where the data (or losses) may be heavy-tailed, algorithms based on empirical risk minimization may require a substantial number of observations in order to perform well off-sample. In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure. Finite-sample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and real-world data illustrate that more efficient and reliable learning is possible without prior knowledge of the loss tails.
Matthew Holland (Osaka University)
Kazushi Ikeda (Nara Institute of Science and Technology)
Related Events (a corresponding poster, oral, or spotlight)
2019 Poster: Better generalization with less data using robust gradient descent »
Thu Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom