Timezone: »

Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification
Nan Lu · Shida Lei · Gang Niu · Issei Sato · Masashi Sugiyama

Thu Jul 22 06:25 AM -- 06:30 AM (PDT) @ None
To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing risk-consistent methods are theoretically grounded with high flexibility, they can learn only from two U sets. In this paper, we propose a new approach for binary classification from $m$ U-sets for $m\ge2$. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC), which is aimed at predicting from which U set each observed sample is drawn. SSC can be solved by a standard (multi-class) classification method, and we use the SSC solution to obtain the final binary classifier through a certain linear-fractional transformation. We built our method in a flexible and efficient end-to-end deep learning framework and prove it to be classifier-consistent. Through experiments, we demonstrate the superiority of our proposed method over state-of-the-art methods.

Author Information

Nan Lu (The University of Tokyo/RIKEN)

Nan Lu is a Ph.D. student at the Department of Complexity Science and Engineering, the University of Tokyo. Her research interests lie in the fields of weakly supervised learning, learning with real-world constraints, and deep learning.

Shida Lei (The University of Tokyo)
Gang Niu (RIKEN)

Gang Niu is currently a research scientist (indefinite-term) at RIKEN Center for Advanced Intelligence Project. He received the PhD degree in computer science from Tokyo Institute of Technology in 2013. Before joining RIKEN as a research scientist, he was a senior software engineer at Baidu and then an assistant professor at the University of Tokyo. He has published more than 70 journal articles and conference papers, including 14 NeurIPS (1 oral and 3 spotlights), 28 ICML, and 2 ICLR (1 oral) papers. He has served as an area chair 14 times, including ICML 2019--2021, NeurIPS 2019--2021, and ICLR 2021--2022.

Issei Sato (University of Tokyo / RIKEN)
Masashi Sugiyama (RIKEN / The University of Tokyo)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors