Skip to yearly menu bar Skip to main content


Understanding the Origins of Bias in Word Embeddings

Marc-Etienne Brunet · Colleen Alkalay-Houlihan · Ashton Anderson · Richard Zemel

Pacific Ballroom #146

Keywords: [ Natural Language Processing ] [ Interpretability ] [ Fairness ] [ Computational Social Sciences ]


Popular word embedding algorithms exhibit stereotypical biases, such as gender bias. The widespread use of these algorithms in machine learning systems can amplify stereotypes in important contexts. Although some methods have been developed to mitigate this problem, how word embedding biases arise during training is poorly understood. In this work we develop a technique to address this question. Given a word embedding, our method reveals how perturbing the training corpus would affect the resulting embedding bias. By tracing the origins of word embedding bias back to the original training documents, one can identify subsets of documents whose removal would most reduce bias. We demonstrate our methodology on Wikipedia and New York Times corpora, and find it to be very accurate.

Live content is unavailable. Log in and register to view live content