Position: Large Language Models Should Learn Personalized Rather Than Aggregated Human Preferences
Abstract
This position paper argues that \textbf{large language models should transition from learning aggregated human preferences to learning personalized, individual preferences}. Current approaches to training language models with reinforcement learning from human feedback (RLHF) aggregate diverse human preferences into singular reward models, fundamentally limiting their ability to serve heterogeneous user populations. This aggregation masks critical information about preference diversity, individual values, and contextual dependencies, effectively optimizing models for a hypothetical ``average user'' who may not exist. We critically examine these limitations, analyze the rich structure that human preferences encode, and make the case for personalized and adaptive language model systems. While personalization offers substantial benefits for diverse user populations, it also introduces serious safety risks including manipulation, filter bubbles, and value lock-in. We discuss these risks in depth, present alternative views and counterarguments to our position, and propose a concrete call to action for responsible development of preference-aware models that respect both individual autonomy and collective safety.