Poster
in
Workshop: The Second Workshop on Spurious Correlations, Invariance and Stability

Cross-Risk Minimization: Inferring Groups Information for Improved Generalization

Mohammad Pezeshki · Diane Bouchacourt · Mark Ibrahim · Nicolas Ballas · Pascal Vincent · David Lopez-Paz

Project Page [ OpenReview]

Abstract

Learning shortcuts, such as relying on spurious correlations or memorizing specific examples, make achieving robust machine learning difficult. invariant learning methods such as GroupDRO, capable of learning from various training groups, are shown to be effective for obtaining more robust models. However, the high cost of annotating data with environmental labels limits the practicality of these algorithms. This work introduces a framework called cross-risk minimization (CRM), which automatically groups examples based on their level of difficulty level. As an extension of the widely-used cross-validation routine, CRM uses the mistakes made by a model on held-out data as a signal to identify challenging examples. By leveraging these mistakes, CRM can effectively label both training and validation examples into groups with different levels of difficulty. We provide experiments on the Waterbirds dataset set, a well-known out-of-distribution (OOD) benchmark to demonstrate the effectiveness of CRM in inferring reliable group labels. These group labels are then used by other invariant learning methods to improve the worst-group accuracy.

Chat is not available.