Abstract

Robust Feature Induction for Support Vector Machines
Rong Jin - Michigan State University Huan Liu - Arizona State University
The goal of feature induction is to automatically create nonlinearcombinations of existing features as additional input features to improveclassification accuracy. Typically, nonlinear features are introduced into asupport vector machine (SVM) through a nonlinear kernel function. Onedisadvantage of such an approach is that the feature space induced by a kernelfunction is usually of high dimension and therefore will substantiallyincrease the chance of over-fitting the training data. Another disadvantage isthat nonlinear features are induced implicitly and therefore are difficult forpeople to understand which induced features are critical to the classificationperformance. In this paper, we propose a boosting-style algorithm that canexplicitly induces important nonlinear features for SVMs. We present empiricalstudies with discussion to show that this approach is effective in improvingclassification accuracy for SVMs. The comparison with an SVM model usingnonlinear kernels also indicates that this approach is effective and robust,particularly when the number of training data is small.

Robust Feature Induction for Support Vector Machines

Rong Jin - Michigan State University
Huan Liu - Arizona State University

The goal of feature induction is to automatically create nonlinearcombinations of existing features as additional input features to improveclassification accuracy. Typically, nonlinear features are introduced into asupport vector machine (SVM) through a nonlinear kernel function. Onedisadvantage of such an approach is that the feature space induced by a kernelfunction is usually of high dimension and therefore will substantiallyincrease the chance of over-fitting the training data. Another disadvantage isthat nonlinear features are induced implicitly and therefore are difficult forpeople to understand which induced features are critical to the classificationperformance. In this paper, we propose a boosting-style algorithm that canexplicitly induces important nonlinear features for SVMs. We present empiricalstudies with discussion to show that this approach is effective in improvingclassification accuracy for SVMs. The comparison with an SVM model usingnonlinear kernels also indicates that this approach is effective and robust,particularly when the number of training data is small.