Skip to yearly menu bar Skip to main content

Affinity Workshop: Queer in AI @ ICML 2022 Affinity Workshop

Invited Talk 3 (Kyra Yee): A Keyword Based Approach to Understanding the Overpenalization of Marginalized Groups by English Marginal Abuse Modeling on Twitter

Kyra Yee


Harmful content detection models tend to have higher false positive rates for content from marginalized groups. Such disproportionate penalization poses the risk of reduced visibility, where marginalized communities lose the opportunity to voice their opinion online. Current approaches to algorithmic harm mitigation are often ad hoc and subject to human bias. We make two main contributions in this paper. First, we design a novel methodology, which provides a principled approach to detecting the severity of potential harms associated with a text-based model. Second, we apply our methodology to audit Twitter’s English marginal abuse model. Without utilizing demographic labels or dialect classifiers, which pose substantial privacy and ethical concerns, we are still able to detect and measure the severity of issues related to the over-penalization of the speech of marginalized communities, such as the use of reclaimed speech, counterspeech, and identity related terms. In order to mitigate the associated harms, we experiment with adding additional true negative examples to the training data. We find that doing so provides improvements to our fairness metrics without large degradations in model performance. Lastly, we discuss challenges to marginal abuse modeling on social media in practice.

Chat is not available.