Timezone: »

Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
Yuan Li · Benjamin Rubinstein · Trevor Cohn

Wed Jun 12 06:30 PM -- 09:00 PM (PDT) @ Pacific Ballroom #240

Crowdsourcing has emerged as a core component of data science pipelines. From collected noisy worker labels, aggregation models that incorporate worker reliability parameters aim to infer a latent true annotation. In this paper, we argue that existing crowdsourcing approaches do not sufficiently model worker correlations observed in practical settings; we propose in response an enhanced Bayesian classifier combination (EBCC) model, with inference based on a mean-field variational approach. An introduced mixture of intra-class reliabilities---connected to tensor decomposition and item clustering---induces inter-worker correlation. EBCC does not suffer the limitations of existing correlation models: intractable marginalisation of missing labels and poor scaling to large worker cohorts. Extensive empirical comparison on 17 real-world datasets sees EBCC achieving the highest mean accuracy across 10 benchmark crowdsourcing methods.

Author Information

Yuan Li (University of Melbourne)
Benjamin Rubinstein (University​ of Melbourne)

Ben joined the University of Melbourne in 2013 as a Senior Lecturer in Computing and Information Systems. Previously he gained four years of industry experience in the research divisions of Microsoft, Google, Intel, Yahoo!, IBM. He has shipped production systems for entity resolution in Bing and the Xbox, identify and plug side-channel attacks against the popular Firefox browser, and [deanonymise](http://www.health.gov.au/internet/main/publishing.nsf/Content/mr-yr16-dept-dept005.htm) an unprecedented Australian Medicare data release, prompting introduction of the [Re-identification Offence Bill 2016](http://www.smh.com.au/national/public-service/can-the-government-really-protect-your-privacy-when-it-deidentifies-public-data-20161204-gt3nny.html). He actively researches topics in machine learning, security & privacy, databases such as adversarial learning, differential privacy and record linkage. Ben earned a PhD from UC Berkeley under Peter Bartlett in 2010.

Trevor Cohn (University of Melbourne)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors