Weak-to-Strong Generalization via Bregman Bias–Variance Decomposition
Abstract
Weak-to-strong generalization (W2SG) is the phenomenon in which a powerful student model, trained on labels produced by a weaker teacher, ultimately outperforms the teacher on the target task. In this work, we theoretically investigate how W2SG can arise via a generalized bias–variance decomposition under Bregman divergence. We show that the expected population risk gap between the student and the teacher is characterized by the expected misfit between the two models. Unlike earlier misfit-based analyses, our theory removes several restrictive assumptions, e.g., it does not require the student hypothesis class to be convex. Our results indicate that W2SG is more likely when the student effectively approximates the teacher's posterior mean. Specializing to squared loss, we provide a sufficient condition (illustrated through a concrete example) under which the student converges to its posterior mean teacher; in particular, increasing the student model size can ensure this convergence. For cross-entropy loss, our analysis further suggests that lowering the entropy of the student's predictive distribution can promote W2SG. We also find that the reverse cross-entropy, unlike the standard forward cross-entropy, is less sensitive to the teacher's predictive uncertainty. Finally, we verify these theoretical insights empirically and demonstrate that incorporating reverse cross-entropy consistently improves student performance.