Let the Prototype Guide You: Robust Aggregation of Sparse Multi-Class Annotations via Annotator Prototype Learning
Abstract
Truth inference is a critical technique for aggregating noisy and biased multi-class classification annotations. State-of-the-art approaches model each annotator using an individual confusion matrix. While well-grounded, they suffer from two fundamental bottlenecks: 1) confusion matrices are underfit when annotators label only a small subset of tasks or when classes are imbalanced, and 2) a single confusion matrix per annotator is inadequate for capturing complex annotator behaviors, leading to class-level collapse when tasks are extremely difficult. Simultaneously addressing these challenges is non-trivial, as it demands both robustness to data sparsity and sufficient expressiveness for complex annotator patterns. In this paper, we propose CPBCC (Class-specific Prototype-driven Bayesian Classifier Combination), which creatively models annotators through a dual-pathway architecture: (i) learning class-specific prototype annotation patterns across all annotators, and (ii) learning annotator-specific weights over prototypes. This framework addresses the bottlenecks and achieves a robust yet rich annotator characterization. Experiments across 10 real-world datasets spanning five domains demonstrate that CPBCC yields a 26\% accuracy improvement in the best case, and boosts average accuracy from 68.73% to 74.11%.