It’s Up to Interpretation: Aligning to One’s Ever-Shifting Internal State
Abstract
Every reader is constantly changing; the same text may be received differently by the same person across affective states, attentional contexts, and frames of reference. Current alignment work recognizes the importance of pluralistic perspectives across individuals and groups, yet often treats interpretation as stable within an individual. We argue for a finer unit of alignment: internal state. Drawing from cognitive psychology, we conduct studies with language-models-as-annotator to show that distinct affective states produce divergent preferences obscured by aggregation. We find that standard inter-annotator agreement diagnostics cannot distinguish this structured divergence from random noise. We discuss implications for preference data collection, downstream applications, and the study of how internal states shape miscommunication.