Timezone: »

Regularized Data Programming with Automated Bayesian Prior Selection
Jacqueline Maasch · Hao Zhang · Qian Yang · Fei Wang · Volodymyr Kuleshov
Event URL: https://openreview.net/forum?id=G6hyjwqUYQ »

The cost of manual data labeling can be a significant obstacle in supervised learning. Data programming (DP) offers a weakly supervised solution for training dataset creation, wherein the outputs of user-defined programmatic labeling functions (LFs) are reconciled through unsupervised learning. However, DP can fail to outperform an unweighted majority vote in some scenarios, including low-data contexts. This work introduces a Bayesian extension of classical DP that mitigates failures of unsupervised learning by augmenting the DP objective with regularization terms. Regularized learning is achieved through maximum a posteriori estimation in the Bayesian model. Majority vote is proposed as a proxy signal for automated prior parameter selection. Results suggest that regularized DP improves performance relative to maximum likelihood and majority voting, confers greater interpretability, and bolsters performance in low-data regimes.

Author Information

Jacqueline Maasch (Department of Computer Science, Cornell University)
Hao Zhang (Weill Cornell Medicine, Cornell University)
Qian Yang (Cornell University)
Fei Wang (Cornell University)
Volodymyr Kuleshov (Cornell Tech)

More from the Same Authors