ICML Poster Linear Adversarial Concept Erasure

Poster

Linear Adversarial Concept Erasure

Shaul Ravfogel · Michael Twiton · Yoav Goldberg · Ryan Cotterell

Hall E #538

Keywords: [ SA: Fairness, Equity, Justice and Safety ] [ SA: Trustworthy Machine Learning ] [ DL: Other Representation Learning ] [ MISC: Representation Learning ] [ SA: Privacy-preserving Statistics and Machine Learning ] [ APP: Language, Speech and Dialog ] [ MISC: General Machine Learning Techniques ] [ Miscellaneous Aspects of Machine Learning ]

[ Abstract ]

[ Poster] [ Paper PDF]

Abstract:

Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to \emph{control} their content becomes an increasingly important problem. In this work, we formulate the problem of identifying a linear subspace that corresponds to a given concept, and removing it from the representation. We formulate this problem as a constrained, linear minimax game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. Surprisingly, we show that the method---despite being linear---is highly expressive, effectively mitigating bias in the output layers of deep, nonlinear classifiers while maintaining tractability and interpretability.

Chat is not available.