Skip to yearly menu bar Skip to main content

Workshop: Subset Selection in Machine Learning: From Theory to Applications

When does loss-based prioritization fail?

Niel Hu · Xinyu Hu · Rosanne Liu · Sara Hooker · Jason Yosinski

Keywords: [ Multitask, Transfer, and Meta Learning ] [ Algorithms ]


Not all examples are created equal, but standard deep neural network training protocols treat each training point uniformly. Each example is propagated forward and backwards through the network the same amount of times, independent of how much the example contributes to the learning protocol. Recent work has proposed ways to accelerate training by deviating from this uniform treatment. Popular methods entail up-weighting examples that contribute more to the loss with the intuition that examples with low loss have already been learned by the model, so their marginal value to the training procedure should be lower. This view assumes that updating the model with high loss examples will be beneficial to the model. However, this may not hold for noisy, real-world data. In this paper, we theorize and then empirically demonstrate that loss-based acceleration methods degrade in scenarios with noisy and corrupted data. Our work suggests measures of example difficulty need to correctly separate out noise from other types of challenging examples.