Poster
in
Workshop: Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models
PabLO: Improving Semi-Supervised Learning with Pseudolabeling Optimization
Harit Vishwakarma · Yi Chen · Satya Sai Srinath Namburi GNVV · Sui Jiet Tay · Ramya Vinayak · Frederic Sala
Modern semi-supervised learning (SSL) methods frequently rely on pseudolabeling and consistency regularization. The main technical challenge in pseudolabeling is identifying the points that can reliably be labeled. Existing methods use ad-hoc or hand-crafted notions of confidence and threshold selection functions to choose points. Though such hand-designed strategies shine on benchmark datasets, they may not fare well in specialized settings. To address this challenge we propose a framework to learn confidence functions and thresholds explicitly aligned with the SSL task, obviating the need for manual designs. Our approach formulates an optimization problem over a flexible space of confidence functions and thresholds, allowing us to obtain optimal scoring functions---while remaining compatible with the most popular and performant SSL techniques today. Extensive empirical evaluation of our method shows up to 11% improvement in test accuracy over the standard baselines while requiring substantially fewer training iterations.