Timezone: »

Pre-Training on a Data Diet: Identifying Sufficient Examples for Early Training
Mansheej Paul · Brett Larsen · Surya Ganguli · Jonathan Frankle · Gintare Karolina Dziugaite
Event URL: https://openreview.net/forum?id=U5QRuy_LjUY »

A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that—after just a few hundred steps of dense training—the method can find a sparse sub-network that can be trained to the same accuracy as the dense network. However, the same does not hold at step 0, i.e., random initialization. In this work, we seek to understand how this early phase of pre-training leads to a good initialization for IMP through the lens of the data distribution. Empirically we observe that, holding the number of pre-training iterations constant, training on a small fraction of (randomly chosen) data suffices to obtain an equally good initialization for IMP. We additionally observe that by pre-training only on "easy" training data we can decrease the number of steps necessary to find a good initialization for IMP compared to training on the full dataset or a randomly chosen subset. Combined, these results provide new insight into the role played by data in the early phase of training.

Author Information

Mansheej Paul (Stanford University)
Brett Larsen (Stanford University)
Surya Ganguli (Stanford)
Jonathan Frankle (MosaicML / Harvard)
Gintare Karolina Dziugaite (Element AI, a ServiceNow Company)

More from the Same Authors