Separating Value Disagreement from Data Uncertainty in Pluralistic Preference Data
Abstract
Pluralistic preference data entangles two operationally distinct phenomena: genuine value disagreement that should be preserved as a multi-modal label, and under-sampled items that need more annotation. Standard ensemble uncertainty machinery conflates the two, treating disagreement as a single signal. We propose a credal disjoint-head model that learns the population-mean preference and a preference-dispersion proxy on separate gradient paths, encouraging a structural separation between the two axes. On a synthetic generator with closed-form ground truth our model recovers the epistemic ranking substantially better than the baseline while preserving aleatoric recovery; on a HelpSteer3 disagreement subset, it decorrelates two estimators that the baseline holds tightly coupled. The decomposition supports a candidate per-item routing rule between "collect more annotators" and "preserve disagreement", and a preliminary held-out annotator simulation shows the rule routes in the predicted direction.