Abstract

Links between Perceptrons, MLPs and SVMs
Ronan Collobert - IDIAP Samy Bengio - IDIAP
We propose to study links between three important classification algorithms:Perceptrons, Multi-Layer Perceptrons (MLPs) and Support Vector Machines(SVMs). We first study ways to control the capacity of Perceptrons (mainlyregularization parameters and early stopping), using the margin ideaintroduced with SVMs. After showing that under simple conditions a Perceptronis equivalent to an SVM, we show it can be computationally expensive in timeto train an SVM (and thus a Perceptron) with stochastic gradient descent,mainly because of the margin maximization term in the cost function. We thenshow that if we remove this margin maximization term, the learning rate or theuse of early stopping can still control the margin. These ideas are extendedafterward to the case of MLPs. Moreover, under some assumptions it alsoappears that MLPs are a kind of mixture of SVMs, maximizing the margin in thehidden layer space. Finally, we present a very simple MLP based on theprevious findings, which yields better performances in generalization andspeed than the other models.

Links between Perceptrons, MLPs and SVMs

Ronan Collobert - IDIAP
Samy Bengio - IDIAP

We propose to study links between three important classification algorithms:Perceptrons, Multi-Layer Perceptrons (MLPs) and Support Vector Machines(SVMs). We first study ways to control the capacity of Perceptrons (mainlyregularization parameters and early stopping), using the margin ideaintroduced with SVMs. After showing that under simple conditions a Perceptronis equivalent to an SVM, we show it can be computationally expensive in timeto train an SVM (and thus a Perceptron) with stochastic gradient descent,mainly because of the margin maximization term in the cost function. We thenshow that if we remove this margin maximization term, the learning rate or theuse of early stopping can still control the margin. These ideas are extendedafterward to the case of MLPs. Moreover, under some assumptions it alsoappears that MLPs are a kind of mixture of SVMs, maximizing the margin in thehidden layer space. Finally, we present a very simple MLP based on theprevious findings, which yields better performances in generalization andspeed than the other models.