Poster
in
Workshop: Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models
Training-free Design of Augmentations with Data-centric Principles
Jieke Wu · Wei Huang · Mingyuan Bai · Xiaoling Hu · Yi Duan · Wuyang Chen
The remarkable advancements in Artificial Intelligence (AI) and Deep Learning owe significantly to the evolution of informative datasets.With the emerging concept of ``Data-centric AI'', there has been a shift in focus from developing deep neural networks (DNNs) to crafting high-quality training datasets. However, current data-centric approaches predominantly rely on empirics or heavy DNN training costs, lacking established design principles.Our work concentrates on data augmentation, a key technique for enhancing data quality.Grounded by the recent development of deep learning theory, we discover principled metrics that effectively gauge both data quality and its interaction with DNNs.Crucially, these principles can be calculated without the need for extensive DNN training, enabling training-free augmentation design with minimal computation costs.Comprehensive experiments validate that our principles are strongly aligned with optimal choices of augmentations used in practice.Our method is particularly beneficial in domain-specific fields like medical image analysis, where the optimal augmentation strategy and the data's inductive bias are often unclear.Our results demonstrate consistent improvements over existing state-of-the-art segmentation methods across various medical imaging datasets.We attach our code at: https://anonymous.4open.science/r/240523anonymousrepo-C828/.