Abstract

A MFoM Learning Approach to Robust Multiclass Multi-Label Text Categorization
sheng Gao - Institute for Infocomm Research, Singapore Wen Wu - Language Technologies Institute, School of Computer Science, Carnegie Mellon University, USA Chin-Hui Lee - School of Electrical and Computer Engineering, Georgia Institute of Technology, USA Tat-Seng Chua - School of Computing, National University of Singapore
We propose a multiclass (MC) classification approach to text categorization(TC). To fully take advantage of both positive and negative training examples,a maximal figure-of-merit (MFoM) learning algorithm is introduced to trainhigh performance MC classifiers. In contrast to conventional binaryclassification, the proposed MC scheme assigns a uniform score function toeach category for each given test sample, and thus the classical Bayesdecision rules can now be applied. Since all the MC MFoM classifiers aresimultaneously trained, we expect them to be more robust and work better thanthe binary MFoM classifiers, which are trained separately and are known togive the best TC performance. Experimental results on the Reuters-21578 TCtask indicate that the MC MFoM classifiers achieve a micro-averaging F1 valueof 0.377, which is significantly better than 0.138, obtained with the binaryMFoM classifiers, for the categories with less than 4 training samples. Furthermore, for all 90 categories, most with large training sizes, the MCMFoM classifiers give a micro-averaging F1 value of 0.888, better than 0.884,obtained with the binary MFoM classifiers.

A MFoM Learning Approach to Robust Multiclass Multi-Label Text Categorization

sheng Gao - Institute for Infocomm Research, Singapore
Wen Wu - Language Technologies Institute, School of Computer Science, Carnegie Mellon University, USA
Chin-Hui Lee - School of Electrical and Computer Engineering, Georgia Institute of Technology, USA
Tat-Seng Chua - School of Computing, National University of Singapore

We propose a multiclass (MC) classification approach to text categorization(TC). To fully take advantage of both positive and negative training examples,a maximal figure-of-merit (MFoM) learning algorithm is introduced to trainhigh performance MC classifiers. In contrast to conventional binaryclassification, the proposed MC scheme assigns a uniform score function toeach category for each given test sample, and thus the classical Bayesdecision rules can now be applied. Since all the MC MFoM classifiers aresimultaneously trained, we expect them to be more robust and work better thanthe binary MFoM classifiers, which are trained separately and are known togive the best TC performance. Experimental results on the Reuters-21578 TCtask indicate that the MC MFoM classifiers achieve a micro-averaging F1 valueof 0.377, which is significantly better than 0.138, obtained with the binaryMFoM classifiers, for the categories with less than 4 training samples. Furthermore, for all 90 categories, most with large training sizes, the MCMFoM classifiers give a micro-averaging F1 value of 0.888, better than 0.884,obtained with the binary MFoM classifiers.