Title :
Text categorization based on. Concept indexing and principal component analysis
Author :
Ke, Huang ; Ma Shaoping
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Abstract :
A major problem in text categorization is the high dimensionality of feature vector space, which is about ten thousands in common. To reduce the dimensionality of the space while keeping the categorization accuracy is useful for improving categorization effectiveness and applying new categorization algorithms. Current feature selection methods for text categorization are partially effective in reducing dimensionality. We put forward a new algorithm, which combines algorithm of concept indexing and principal component analysis, for reducing dimensionality. From the experiments, we find that this algorithm can effectively reduce dimensionality without sacrificing categorization accuracy.
Keywords :
database indexing; information retrieval; principal component analysis; categorization effectiveness; concept indexing; feature selection methods; feature vector space; principal component analysis; text categorization; Classification tree analysis; Computer science; Indexing; Intelligent systems; Internet; Principal component analysis; Prototypes; Space technology; Testing; Text categorization;
Conference_Titel :
TENCON '02. Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering
Print_ISBN :
0-7803-7490-8
DOI :
10.1109/TENCON.2002.1181212