Workshop
Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models
Adam Mahdi · Ludwig Schmidt · Alexandros Dimakis · Rotem Dror · Georgia Gkioxari · Sang Truong · Lilith Bat-Leah · Fatimah Alzamzami · Georgios Smyrnis · Thao Nguyen · Nezihe Merve Gürel · Paolo Climaco · Luis Oala · Hailey Schoelkopf · Andrew M. Bean · Berivan Isik · Vaishaal Shankar · Mayee Chen · Achal Dave
Straus 3
Sat 27 Jul, midnight PDT
This workshop addresses the growing significance of preparing high quality datasets for the development of large-scale foundation models. With recent advancements highlighting the key role of dataset size, quality, diversity, and provenance in model performance, this workshop considers the strategies employed for enhancing data quality, including filtering, augmentation, and relabeling. The workshop draws upon the increasing interest in data-centric research. It seeks to advance understanding and methodologies for dataset composition and curation, ultimately fostering the development of more robust models capable of addressing diverse challenges across multiple domains and that can benefit the public.