Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Workshop on Generative AI and Law (GenLaw ’24)

Tracing datasets usage in the wild with data taggants

Wassim Bouaziz · El-Mahdi El-Mhamdi · Nicolas Usunier


Abstract:

Voices in the scientific community and legal instances such as the European Parliament have asked models providers to disclose which datasets have been used to train their models. But model providers could unwillingly omit certain training datasets or even ignore using unauthorized data.We propose a method for dataset tracing that relies on a statistical test and only require API access to the model.

Chat is not available.