Position: LLM alignment data should be regulated as mass media
Abstract
Most efforts to regulate and estimate the societal impacts of Large Language Models (LLMs) are aimed at model outputs. This makes regulation difficult, because these are stochastic and highly conditioned on diverse user prompts. This position paper draws from communication studies literature to argue that the regulatory focus has been misplaced, and that alignment datasets (e.g., supervised fine-tuning and preference pairs) should be regulated at the same level as mass media content such as newspaper articles or television advertising. Post-training alignment data has a direct influence on all user interactions with a model, representing the same one-to-many communication flow as traditional mass media. At the same time, mass media regulation has balanced for decades the need for audience protection with room for pluralist perspectives, providing a source or learning and inspiration for LLM regulation. Regulating post-training alignment data as mass media content is the most direct and actionable route for pluralism and accountability in LLM development and deployment.