Position: Breaking the Dual Curse of Multilingual AI Requires Socio-Technical Guardrails, Not Post-Hoc Alignment
Abstract
Large language models are deployed globally as universal systems, yet their safety mechanisms remain English-optimized. This creates a Dual Curse for speakers of low-resource languages: a Harmfulness Curse where harmful content generation rises from 1\% in English to 35\% in languages like Hausa, Igbo, and Javanese, and a Relevance Curse where instruction-following drops by 20 percentage points, making these systems simultaneously more dangerous and less useful. Drawing on a PRISMA-guided systematic review of 207 studies, we demonstrate that this disparity stems from a pre-training bottleneck: reward models achieve only 49--50\% accuracy in low-resource languages (equivalent to random chance), rendering post-hoc alignment structurally ineffective. These technical failures become governance hazards when at least 22 countries mandate automated content moderation, creating an infrastructure that is exploitable for censorship. Therefore, we propose a socio-technical framework addressing this inequity: (1) safety context distillation during pre-training (achieving 78--89\% harm reduction); (2) participatory harm specification by affected communities; and (3) evaluation metrics jointly tracking attack resistance and false refusal rates across languages.