Around the World in Eighty Ratings? Quantifying the Salience of Geo-Cultural Values for Pluralistic Alignment
Arkadiy Saakyan ⋅ Charvi Rastogi ⋅ Lora Aroyo
Abstract
Safe global deployment of AI models requires alignment with pluralistic human values, yet in existing safety evaluation datasets the rater pools remain largely homogeneous along geo-cultural dimensions. Through a meta-analysis of existing safety datasets, we observe that the vast majority does not include any geo-cultural information, and the ones that do, lack a robust approach to collect and understand cultural differences in safety ratings. Using the Inglehart-Welzel dimensions of cross-cultural variation, we demonstrate via hierarchical linear modeling that geo-cultural values predict safety ratings significantly better than demographic factors alone ($p<0.05$ in $6$ datasets). Further, our analysis shows that several safety datasets contain at least 10\% of culturally-sensitive items, where lack of cultural representation in the rater pool would lead to a false negative in safety classification. Finally, we provide empirical evidence that fine-tuned LLMs can identify culturally sensitive items but are not reliable at emulating judgments of raters from diverse cultural backgrounds, underscoring the critical need for continuous geo-culturally stratified (pluralistic) safety evaluations.
Successful Page Load