Skip to yearly menu bar Skip to main content


Poster

Domain-wise Data Acquisition to Improve Performance under Distribution Shift

Yue He · Dongbai Li · Pengfei Tian · Han Yu · Jiashuo Liu · Hao Zou · Peng Cui

Hall C 4-9 #2314
[ ] [ Paper PDF ]
[ Poster
Thu 25 Jul 2:30 a.m. PDT — 4 a.m. PDT

Abstract:

Despite notable progress in enhancing the capability of machine learning against distribution shifts, training data quality remains a bottleneck for cross-distribution generalization. Recently, from a data-centric perspective, there have been considerable efforts to improve model performance through refining the preparation of training data. Inspired by realistic scenarios, this paper addresses a practical requirement of acquiring training samples from various domains on a limited budget to facilitate model generalization to target test domain with distribution shift. Our empirical evidence indicates that the advance in data acquisition can significantly benefit the model performance on shifted data. Additionally, by leveraging unlabeled test domain data, we introduce a Domain-wise Active Acquisition framework. This framework iteratively optimizes the data acquisition strategy as training samples are accumulated, theoretically ensuring the effective approximation of test distribution. Extensive real-world experiments demonstrate our proposal's advantages in machine learning applications. The code is available at https://github.com/dongbaili/DAA.

Chat is not available.