Poster
in
Workshop: Principles of Distribution Shift (PODS)
Task Modeling: A Multitask Approach for Improving Robustness to Group Shifts
Dongyue Li · Huy Nguyen · Hongyang Zhang
We study the problem of learning from multiple groups of heterogeneous data distributions. Previous work shows that machine learning models trained under group shifts can exhibit poor performance on groups whose training set size is usually small. In this work, we explore multitask learning approaches to augment the training set and optimize the worst-group performance of a target task. A critical challenge in multitask learning is how to identify beneficial source tasks for a target task. To address this challenge, we propose a task modeling framework that learns a mapping from any subset of source tasks to their transferability on the target task. Our key finding is that with outputs from training models on randomly subsampled source tasks, a linear task model can accurately predict the results of multitask training for a target task. This finding implies an algorithm that selects beneficial source tasks using the learned task model. We validate our approach on a tabular dataset with 50 tasks. Our experiments demonstrate that our task selection algorithm achieves an average improvement of 1.03% in the worst-group accuracy on six target tasks compared to prior methods. Meanwhile, our approach is applicable to other performance metrics, including average performance and fairness measures, and outperforms baselines by 0.57% and 2.09%, respectively.