Timezone: »
We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification. For linearly separable training data, the maximum margin algorithm has been shown in previous work to be equivalent to a limit of training with logistic loss using gradient descent, as the training error is driven to zero. We analyze this algorithm applied to random data including misclassification noise. Our assumptions on the clean data include the case in which the class-conditional distributions are standard normal distributions. The misclassification noise may be chosen by an adversary, subject to a limit on the fraction of corrupted labels. Our bounds show that, with sufficient over-parameterization, the maximum margin algorithm trained on noisy data can achieve nearly optimal population risk.
Author Information
Niladri Chatterji (UC Berkeley)
Phil Long (Google)
More from the Same Authors
-
2021 : When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations? »
Niladri Chatterji · Phil Long · Peter Bartlett -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2023 Poster: Deep linear networks can benignly overfit when shallow ones do »
Niladri S. Chatterji · Phil Long -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2019 Poster: Online learning with kernel losses »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett -
2019 Oral: Online learning with kernel losses »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett -
2018 Poster: Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks »
Peter Bartlett · Dave Helmbold · Phil Long -
2018 Poster: On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo »
Niladri Chatterji · Nicolas Flammarion · Yian Ma · Peter Bartlett · Michael Jordan -
2018 Oral: On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo »
Niladri Chatterji · Nicolas Flammarion · Yian Ma · Peter Bartlett · Michael Jordan -
2018 Oral: Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks »
Peter Bartlett · Dave Helmbold · Phil Long