On Contraction of Sequential and Offset Rademacher Complexities
Abstract
The Rademacher complexity of a function class is among the most basic notions of its ``size'' and yields classical offline generalization bounds for Lipschitz loss functions that lead in turn to a modern understanding of statistical learning. More recently, the sequential and offset Rademacher complexities were introduced to prove analogous generalization bounds for online learning and for prediction with squared loss. A fundamental structural result in the theory of Rademacher complexity, with many applications to learning theory, is the Ledoux--Talagrand contraction lemma, which states that the Rademacher complexity of a composition of a function class with a fixed Lipschitz function is at most that of the original class. We show that, under structural assumptions on the function class, this contraction extends to sequential and offset Rademacher complexity at the price of polylogarithmic factors. We further show that these logarithmic factors cannot be removed in general and, absent these additional structural assumptions, no such contraction inequality can hold. These results together indicate that the sequential and offset Rademacher complexities behave fundamentally differently from the classical Rademacher complexity with respect to contraction, which in turn has broad implications for understanding the sample complexities of online learning and regression with squared loss for composed function classes.