Timezone: »

AutoBiasTest: Controllable Test Sentence Generation for Open-Ended Social Bias Testing in Language Models at Scale
Rafal Kocielnik · Shrimai Prabhumoye · Vivian Zhang · R. Alvarez · Anima Anandkumar
Event URL: https://openreview.net/forum?id=ggMyGIZG0O »

Social bias in Pretrained Language Models (PLMs) affects text generation and other downstream NLP tasks. Existing bias testing methods rely predominantly on manual templates or on expensive crowd-sourced data. We propose a novel AutoBiasTest method that automatically generates controlled sentences for testing bias in PLMs, hence providing a flexible and low-cost alternative. Our approach uses another PLM for generation controlled by conditioning on social group and attribute terms. We show that generated sentences are natural and similar to human-produced content in terms of word length and diversity. We find that our bias scores are well correlated with manual templates, but AutoBiasTest highlights biases not captured by these templates due to more diverse and realistic contexts. By automating large-scale test sentence generation, we enable better estimation of underlying bias distributions.

Author Information

Rafal Kocielnik (California Institute of Technology)
Rafal Kocielnik

Rafal was born in Warsaw, Poland. After completing his undergraduate studies in Computer Science in Poland, he pursued his passion for HCI by enrolling in the PhD program at the University of Washington in Seattle. His doctoral thesis, titled "Designing Engaging Conversational Interactions for Health & Behavior Change" explored the value and challenges of applying conversational user interfaces for self-improvement and social good. Throughout his career, Rafal interned at some of the leading tech companies and research labs, including Philips Research, Microsoft Research, Fuji-Xerox, and AI2. He has also been awarded the Computing Innovation Fellowship in 2020. Currently, Rafal is a post-doctoral researcher at Caltech, where he focuses on Human-AI interaction. His research topics include the detection of social bias in Large Language Models (LLMs), the use of LLMs for offensive content identification in social media, and the application of AI in high-stakes surgical contexts for feedback delivery to surgical trainees.

Shrimai Prabhumoye (NVIDIA)
Vivian Zhang (California Institute of Technology)
R. Alvarez
Anima Anandkumar (Caltech and NVIDIA)

Anima Anandkumar is a Bren Professor at Caltech and Director of ML Research at NVIDIA. She was previously a Principal Scientist at Amazon Web Services. She is passionate about designing principled AI algorithms and applying them to interdisciplinary domains. She has received several honors such as the IEEE fellowship, Alfred. P. Sloan Fellowship, NSF Career Award, Young investigator awards from DoD, Venturebeat’s “women in AI” award, NYTimes GoodTech award, and Faculty Fellowships from Microsoft, Google, Facebook, and Adobe. She is part of the World Economic Forum's Expert Network. She has appeared in the PBS Frontline documentary on the “Amazon empire” and has given keynotes in many forums such as the TEDx, KDD, ICLR, and ACM. Anima received her BTech from Indian Institute of Technology Madras, her PhD from Cornell University, and did her postdoctoral research at MIT and assistant professorship at University of California Irvine.

More from the Same Authors