Optimal Learning from Label Proportions with General Loss Functions
Abstract
Motivated by problems in online advertising, we address the task of Learning from Label Proportions (LLP). We introduce a novel and versatile low-variance debiasing methodology to learn from aggregate label information, significantly advancing the state of the art in LLP. Our debiasing approach exhibits remarkable flexibility, seamlessly accommodating a broad spectrum of practically relevant loss functions across both binary and multi-class classification settings. By carefully combining our estimators with standard techniques, we improve sample complexity guarantees for a large class of losses of practical relevance. We also empirically validate the efficacy of our proposed approach across a diverse array of benchmark datasets, demonstrating compelling empirical advantages over standard baselines.