Skip to yearly menu bar Skip to main content


Exploiting Worker Correlation for Label Aggregation in Crowdsourcing

Yuan Li · Benjamin Rubinstein · Trevor Cohn

Pacific Ballroom #240

Keywords: [ Graphical Models ] [ Generative Models ] [ Crowdsourcing ] [ Bayesian Methods ]


Crowdsourcing has emerged as a core component of data science pipelines. From collected noisy worker labels, aggregation models that incorporate worker reliability parameters aim to infer a latent true annotation. In this paper, we argue that existing crowdsourcing approaches do not sufficiently model worker correlations observed in practical settings; we propose in response an enhanced Bayesian classifier combination (EBCC) model, with inference based on a mean-field variational approach. An introduced mixture of intra-class reliabilities---connected to tensor decomposition and item clustering---induces inter-worker correlation. EBCC does not suffer the limitations of existing correlation models: intractable marginalisation of missing labels and poor scaling to large worker cohorts. Extensive empirical comparison on 17 real-world datasets sees EBCC achieving the highest mean accuracy across 10 benchmark crowdsourcing methods.

Live content is unavailable. Log in and register to view live content