DataPerf: Benchmarking Data for Data-Centric AI

Lora Aroyo · Newsha Ardalani · Colby Banbury · Gregory Diamos · William Gaviria Rojas · Tzu-Sheng Kuo · Mark Mazumder · Peter Mattson · Praveen Paritosh

Ballroom 3


This workshop proposal builds on the success of the 1st Data-Centric AI Workshop organized at NeurIPS 2021 (which attracted more than 160 submissions and close to 200 participants) and expands the effort to engage the community with the active interdisciplinary MLCommons community of practitioners, researchers and engineers from both academia and industry by presenting the current state-of-the-art, work-in-progress and a set of open problems in the field of benchmarking data for ML. Many of these areas are in a nascent stage, and we hope to further their development by knitting them together into a coherent whole. We seek to drive progress in addressing these core problems by promoting the creation of a set of benchmarks for data quality and data-related algorithms. We want to bring together work that pushes forward this new view of data-centric ML benchmarks, e.g. the initiatives at MLCommons, a non-profit that operates the MLPerf benchmarks that have become standard for AI chip speed but also others including Dynabench, OpenML, data-centric AI hub, etc. We envision MLCommons as providing a framework and resources for the evolution of benchmarks in this space, and our workshop as showcasing the best innovations revealed by those benchmarks and providing a focus event for the community submitting to them.A huge amount of innovation — in algorithms, ideas, principles, and tools — is needed to make data-centric AI development efficient and effective. We hope that this workshop will help spark that innovation.

Chat is not available.
Timezone: America/Los_Angeles »