Timezone: »

Cross-Risk Minimization: Inferring Groups Information for Improved Generalization
Mohammad Pezeshki · Diane Bouchacourt · Mark Ibrahim · Nicolas Ballas · Pascal Vincent · David Lopez-Paz
Event URL: https://openreview.net/forum?id=G5hdQh0fFt »

Learning shortcuts, such as relying on spurious correlations or memorizing specific examples, make achieving robust machine learning difficult. invariant learning methods such as GroupDRO, capable of learning from various training groups, are shown to be effective for obtaining more robust models. However, the high cost of annotating data with environmental labels limits the practicality of these algorithms. This work introduces a framework called cross-risk minimization (CRM), which automatically groups examples based on their level of difficulty level. As an extension of the widely-used cross-validation routine, CRM uses the mistakes made by a model on held-out data as a signal to identify challenging examples. By leveraging these mistakes, CRM can effectively label both training and validation examples into groups with different levels of difficulty. We provide experiments on the Waterbirds dataset set, a well-known out-of-distribution (OOD) benchmark to demonstrate the effectiveness of CRM in inferring reliable group labels. These group labels are then used by other invariant learning methods to improve the worst-group accuracy.

Author Information

Mohammad Pezeshki (Meta (FAIR))
Diane Bouchacourt (Meta)
Mark Ibrahim (Fundamental AI Research (FAIR), Meta AI)
Nicolas Ballas (Université de Montréal)
Pascal Vincent (University of Montreal)
David Lopez-Paz (Facebook AI Research)

More from the Same Authors