Title :
Dimension reduction using least squares regression in multi-labeled text categorization
Author :
Park, Cheong Hee
Author_Institution :
Dept. of Comput. Sci. & Eng., Chungnam Nat. Univ., Daejeon
Abstract :
Dimension reduction is a preprocessing step by which small number of optimal features are extracted. Among several statistical dimension reduction methods, Linear discriminant analysis (LDA) performs dimension reduction to maximize class separability in the reduced dimensional space. However, in multi-labeled problems, data samples belonging to multiple classes cause contradiction between the maximization of the distances between classes and the minimization of the scatter within classes, since they are placed in the overlapping area of multiple classes. In this paper, we show that in multi-labeled text categorization, the outputs from multiple linear methods can be used to compose new features for low dimensional representation. Especially, we apply least squares regression and a linear support vector machine (SVM) for multiple binary-class problems constructed from a multi-labeled problem and obtain optimal features in a low dimensional space which are fed into another classification algorithm. Extensive experimental results in text categorization are presented comparing with other dimension reduction methods and multi-label classification algorithms.
Keywords :
classification; least squares approximations; regression analysis; support vector machines; text analysis; data samples; dimensional space reduction; feature extraction; least squares regression; linear discriminant analysis; linear support vector machine; low dimensional space; multilabel classification algorithms; multilabeled text categorization; multiple binary-class problems; multiple linear methods; statistical dimension reduction; Classification algorithms; Data mining; Indexing; Large scale integration; Least squares methods; Linear discriminant analysis; Scattering; Support vector machine classification; Support vector machines; Text categorization;
Conference_Titel :
Computer and Information Technology, 2008. CIT 2008. 8th IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-2357-6
Electronic_ISBN :
978-1-4244-2358-3
DOI :
10.1109/CIT.2008.4594652