Minimum Reference Set Based Feature Selection for Small Sample Classifications
Xue-wen Chen - Department of Electrical Engineering and Computer Science, The University of Kansas, USA
Jong Cheol Jeong - Department of Electrical Engineering and Computer Science, The University of Kansas, USA
We address feature selection problems for classification of small samples and high dimensionality. A practical example is microarray-based cancer classification problems, where sample size is typically less than 100 and number of features is several thousands or higher. One of the commonly used methods in addressing this problem is recursive feature elimination (RFE) method, which utilizes the generalization capability embedded in support vector machines and is thus suitable for small samples problems. We propose a novel method using minimum reference set (MRS) generated by the nearest neighbor rule. MRS is the set of minimum number of samples that correctly classify all the training samples. It is related to structural risk minimization principle and thus leads to good generalization. The proposed MRS based method is compared to RFE method with several real datasets, and experimental results show that the MRS method produces better classification performance.