Spotlight

Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets

Hongxin Wei ⋅ Lue Tao ⋅ RENCHUNZI XIE ⋅ LEI FENG ⋅ Bo An

Keywords: MISC: Unsupervised and Semi-supervised Learning MISC: Representation Learning DL: Other Representation Learning T: Deep Learning

2022 Spotlight

[ Paper PDF]

Abstract

Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance. Recent studies found that directly training with out-of-distribution data (i.e., open-set samples) in a semi-supervised manner would harm the generalization performance. In this work, we theoretically show that out-of-distribution data can still be leveraged to augment the minority classes from a Bayesian perspective. Based on this motivation, we propose a novel method called Open-sampling, which utilizes open-set noisy labels to re-balance the class priors of the training dataset. For each open-set instance, the label is sampled from our pre-defined distribution that is complementary to the distribution of original class priors. We empirically show that Open-sampling not only re-balances the class priors but also encourages the neural network to learn separable representations. Extensive experiments demonstrate that our proposed method significantly outperforms existing data re-balancing methods and can boost the performance of existing state-of-the-art methods.

Video

Chat is not available.