A Bayesian Approach to Quantify the Uncertainty of Human Ratings in a Single-Instance Multimodal Framework
Abstract
Human ratings are central to learning and inference across several application domains, but they are also subject to inter-rater biases and judgment errors. Quantifying the uncertainty of these human ratings would require repeated measurements, which are expensive and rarely available at scale. We propose a Bayesian graphical model to estimate the instance-level and item-level uncertainty of (subjective) human ratings by leveraging auxiliary (objective) data. Our model learns a shared latent content representation that explains factors common to both the human rating and auxiliary data and a latent uncertainty variable that captures fluctuations in the human assessments via a data-conditioned prior. We develop a scalable amortized variational inference procedure that uses modality-appropriate neural encoders and decoders to represent the posterior factors. Experiments on synthetic data demonstrate that our framework can accurately recover the latent uncertainty under targeted ablations and stress tests. We further demonstrate our approach on a real-world dataset of paired functional MRI scans and behavioral testing for autism, thus highlighting the need for uncertainty quantification.