DocumentCode
390583
Title
Text categorization based on. Concept indexing and principal component analysis
Author
Ke, Huang ; Ma Shaoping
Author_Institution
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Volume
1
fYear
2002
fDate
28-31 Oct. 2002
Firstpage
51
Abstract
A major problem in text categorization is the high dimensionality of feature vector space, which is about ten thousands in common. To reduce the dimensionality of the space while keeping the categorization accuracy is useful for improving categorization effectiveness and applying new categorization algorithms. Current feature selection methods for text categorization are partially effective in reducing dimensionality. We put forward a new algorithm, which combines algorithm of concept indexing and principal component analysis, for reducing dimensionality. From the experiments, we find that this algorithm can effectively reduce dimensionality without sacrificing categorization accuracy.
Keywords
database indexing; information retrieval; principal component analysis; categorization effectiveness; concept indexing; feature selection methods; feature vector space; principal component analysis; text categorization; Classification tree analysis; Computer science; Indexing; Intelligent systems; Internet; Principal component analysis; Prototypes; Space technology; Testing; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
TENCON '02. Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering
Print_ISBN
0-7803-7490-8
Type
conf
DOI
10.1109/TENCON.2002.1181212
Filename
1181212
Link To Document