A MFoM Learning Approach to Robust Multiclass Multi-Label Text Categorization |
---|
sheng Gao - Institute for Infocomm Research, Singapore Wen Wu - Language Technologies Institute, School of Computer Science, Carnegie Mellon University, USA Chin-Hui Lee - School of Electrical and Computer Engineering, Georgia Institute of Technology, USA Tat-Seng Chua - School of Computing, National University of Singapore |
We propose a multiclass (MC) classification approach to text categorization(TC). To fully take advantage of both positive and negative training examples,a maximal figure-of-merit (MFoM) learning algorithm is introduced to trainhigh performance MC classifiers. In contrast to conventional binaryclassification, the proposed MC scheme assigns a uniform score function toeach category for each given test sample, and thus the classical Bayesdecision rules can now be applied. Since all the MC MFoM classifiers aresimultaneously trained, we expect them to be more robust and work better thanthe binary MFoM classifiers, which are trained separately and are known togive the best TC performance. Experimental results on the Reuters-21578 TCtask indicate that the MC MFoM classifiers achieve a micro-averaging F1 valueof 0.377, which is significantly better than 0.138, obtained with the binaryMFoM classifiers, for the categories with less than 4 training samples. Furthermore, for all 90 categories, most with large training sizes, the MCMFoM classifiers give a micro-averaging F1 value of 0.888, better than 0.884,obtained with the binary MFoM classifiers. |