Abstract:
In this work, we study statistical learning with dependent data and square loss in a hypothesis class with tail decay in Orlicz space: . Our inquiry is motivated by the search for a sharp noise interaction term, or variance proxy, in learning with dependent (e.g. -mixing) data. Typical non-asymptotic results exhibit variance proxies that are deflated *multiplicatively* in the mixing time of the underlying covariates process. We show that whenever the topologies of and are comparable on our hypothesis class , the empirical risk minimizer achieves a rate that only depends on the complexity of the class and second order statistics in its leading term. We refer to this as a *near mixing-free rate*, since direct dependence on mixing is relegated to an additive higher order term. Our approach, reliant on mixed tail generic chaining, allows us to obtain sharp, instance-optimal rates. Examples that satisfy our framework include for instance sub-Gaussian linear regression and bounded smoothness classes.
Chat is not available.