DocumentCode :
2372229
Title :
A minimum classification error (MCE) framework for generalized linear classifier in machine learning for text categorization/retrieval
Author :
Wu Chou ; Li Li
Author_Institution :
Avaya Labs Research, 233 Mt. Airy Road, Basking Ridge, NJ 07920, USA
fYear :
2004
fDate :
16-18 Dec. 2004
Firstpage :
26
Lastpage :
33
Abstract :
In this paper, we present the theoretical framework of minimum classification error (MCE) training of generalized linear classifiers for text classification. We show that many important text classifiers, either probabilistic or non-probabilistic, can be unified under this framework, and the proposed MCE classifier training approach can be applied to improve the classifier performance. In addition, we describe an effective MCE classifier training algorithm that uses AdaBoost to generate alternative initial classifiers, as opposed to combining multiple classifiers as it is typically used. This method is applied to MCE classifier training to overcome local minimums in optimal classifier parameter search, utilizing the fact that the family of generalized linear classifiers is closed under AdaBoost. Moreover, we extend the loss function in MCE training to incorporate training sample prior distributions to compensate the imbalanced training data distribution in each category. Experimental studies are performed on the text classification tasks, and the significant classification error reductions of 25% - 55% are observed.
Keywords :
Boosting; Databases; Filtering; Information retrieval; Machine learning; Man machine systems; Natural languages; Routing; Text categorization; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications, 2004. Proceedings. 2004 International Conference on
Conference_Location :
Louisville, Kentucky, USA
Print_ISBN :
0-7803-8823-2
Type :
conf
DOI :
10.1109/ICMLA.2004.1383490
Filename :
1383490
Link To Document :
بازگشت