DocumentCode :
390583
Title :
Text categorization based on. Concept indexing and principal component analysis
Author :
Ke, Huang ; Ma Shaoping
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Volume :
1
fYear :
2002
fDate :
28-31 Oct. 2002
Firstpage :
51
Abstract :
A major problem in text categorization is the high dimensionality of feature vector space, which is about ten thousands in common. To reduce the dimensionality of the space while keeping the categorization accuracy is useful for improving categorization effectiveness and applying new categorization algorithms. Current feature selection methods for text categorization are partially effective in reducing dimensionality. We put forward a new algorithm, which combines algorithm of concept indexing and principal component analysis, for reducing dimensionality. From the experiments, we find that this algorithm can effectively reduce dimensionality without sacrificing categorization accuracy.
Keywords :
database indexing; information retrieval; principal component analysis; categorization effectiveness; concept indexing; feature selection methods; feature vector space; principal component analysis; text categorization; Classification tree analysis; Computer science; Indexing; Intelligent systems; Internet; Principal component analysis; Prototypes; Space technology; Testing; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
TENCON '02. Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering
Print_ISBN :
0-7803-7490-8
Type :
conf
DOI :
10.1109/TENCON.2002.1181212
Filename :
1181212
Link To Document :
بازگشت