Validation-Free Early Stopping for XGBoost via Spectral Saturation
Shahrizoda Jurakulova
Abstract
At every round of gradient boosting, one prediction vector gt=ηftg_t = \eta f_t gt=ηft is added to the model. We collect these as columns of a contribution matrix G1:T∈Rn×TG_{1:T} \in \mathbb{R}^{n \times T} G1:T∈Rn×T. Under squared loss, its stable rank $$r_s(G_{1:T}) := \|G_{1:T}\|_F^2 / \sigma_1(G_{1:T})^2$$ saturates: it is bounded above by a constant in TT T that depends only on the residual decay rate γ\gamma γ and the learning rate η\eta η. Building on this, we propose SESGB, an XGBoost callback that decides when to stop without a held-out validation set. On 21 OpenML tabular benchmarks, SESGB does not match tuned validation-based stopping in mean accuracy. Median gap is −0.81%-0.81\% −0.81%, and three datasets degrade by 6–17%. We position SESGB as a structural fallback for the validation-free regime, not as a competitor when validation data is available.
Successful Page Load