DocumentCode :
2815259
Title :
Feature reduction for neural network based text categorization
Author :
Lam, Savio L Y ; Lee, Dik Lun
Author_Institution :
Dept. of Comput. Sci., Hong Kong Univ., Hong Kong
fYear :
1999
fDate :
1999
Firstpage :
195
Lastpage :
202
Abstract :
In a text categorization model using an artificial neural network as the text classifier scalability is poor if the neural network is trained using the raw feature space since textural data has a very high-dimension feature space. We proposed and compared four dimensionality reduction techniques to reduce the feature space into an input space of much lower dimension for the neural network classifier. To test the effectiveness of the proposed model, experiments were conducted using a subset of the Reuters-22173 test collection for text categorization. The results showed that the proposed model was able to achieve high categorization effectiveness as measured by precision and recall. Among the four dimensionality reduction techniques proposed, principal component analysis was found to be the most effective in reducing the dimensionality of the feature space
Keywords :
classification; feedforward neural nets; full-text databases; multilayer perceptrons; principal component analysis; text analysis; Reuters-22173 test collection; artificial neural network; categorization effectiveness; dimensionality reduction techniques; feature reduction; high-dimension feature space; input space; neural network based text categorization; precision; principal component analysis; recall; text categorization model; text classifier; Artificial neural networks; Computer science; Ducts; Neural networks; Principal component analysis; Scalability; Space technology; Testing; Text categorization; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Systems for Advanced Applications, 1999. Proceedings., 6th International Conference on
Conference_Location :
Hsinchu
Print_ISBN :
0-7695-0084-6
Type :
conf
DOI :
10.1109/DASFAA.1999.765752
Filename :
765752
Link To Document :
بازگشت