ICML Poster Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification

Poster

Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification

Nan Lu · Shida Lei · Gang Niu · Issei Sato · Masashi Sugiyama

Keywords: [ Semi-supervised learning ] [ Algorithms ]

[ Abstract ] [ Paper PDF ]

[ Paper ]

[ Visit Poster at Spot C4 in Virtual World ]

Abstract: To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing risk-consistent methods are theoretically grounded with high flexibility, they can learn only from two U sets. In this paper, we propose a new approach for binary classification from

m

$m$ U-sets for

m \geq 2

$m\ge2$ . Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC), which is aimed at predicting from which U set each observed sample is drawn. SSC can be solved by a standard (multi-class) classification method, and we use the SSC solution to obtain the final binary classifier through a certain linear-fractional transformation. We built our method in a flexible and efficient end-to-end deep learning framework and prove it to be classifier-consistent. Through experiments, we demonstrate the superiority of our proposed method over state-of-the-art methods.

Chat is not available.