Skip to yearly menu bar Skip to main content


Poster

Position Paper: Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?

Shayne Longpre · Robert Mahari · Naana Obeng-Marnu · William Brannon · Tobin South · Katy Gero · Alex Pentland · Jad Kabbara


Abstract:

New AI capabilities are owed in large part to massive, widely-sourced, and under-documented training data collections.Dubious collection practices have spurred crises in data transparency, authenticity, consent, privacy, representation, bias, copyright infringement, and the overall development of ethical and trustworthy AI systems. In response, AI regulation is emphasizing the need for training data transparency to understand AI models' limitations. Based on a large-scale analysis of the AI training data landscape and existing solutions, we identify the missing infrastructure to facilitate responsible AI development practices. We explain why existing tools for data authenticity, consent, and documentation alone are unable to solve the core problems facing the AI community, and outline how policymakers, developers, and data creators can facilitate responsible AI development, by adopting universal data provenance standards.

Live content is unavailable. Log in and register to view live content