Timezone: »

 
Unsupervised Learning under Latent Label Shift
Pranav Mani · Manley Roberts · Saurabh Garg · Zachary Lipton
Event URL: https://openreview.net/forum?id=CbxgFfEEP7P »
What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional unsupervised learning approaches risk recovering incorrect classes based on spurious data-space similarity. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where the label marginals $p_d(y)$ shift but the class conditionals $p(\mathbf{x}|y)$ do not. This setting suggests a new principle for identifying classes: elements that shift together across domains belong to the same true class. For finite input spaces, we establish an isomorphism between LLS and topic modeling; for continuous data, we show that if each label's support contains a separable region, analogous to an anchor word, oracle access to $p(d|\mathbf{x})$ suffices to identify $p_d(y)$ and $p_d(y|\mathbf{x})$ up to permutation. Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine recovered $p(y|d)$ with discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|\mathbf{x}) \; \forall d$. In semi-synthetic experiments, we show that our algorithm can use domain information to overcome a failure mode of standard unsupervised classification in which data-space similarity does not indicate true groupings.

Author Information

Pranav Mani (School of Computer Science, Carnegie Mellon University)

Hey! I am Pranav Mani, a master's student in the Machine Learning Department at Carnegie Mellon University. Here, I am advised by Professor Zachary Lipton. I did my undergrad at NIT-Trichy, India. I am interested in problems in Domain Shift, Deep Learning, NLP and RL

Manley Roberts (Carnegie Mellon University)
Manley Roberts

I'm a Masters in Machine Learning student at Carnegie Mellon University (where I do research on distribution shift with Prof. Zack Lipton in the ACMI Lab). Previously, I did an undergrad at Georgia Tech in Computer Science (studying Intelligence, Systems & Architecture) with a minor in Mathematics. I also worked an internship last summer (2021) at IBM Research in Hybrid Cloud division. I'm interested in tackling tricky problems in distribution shift and deep learning.

Saurabh Garg (Carnegie Mellon University)
Zachary Lipton (Carnegie Mellon University)

More from the Same Authors