DocumentCode :
3104854
Title :
An Efficient Document Categorization Model Based on LSA and BPNN
Author :
Li, Cheng Hua ; Park, Soon Cheol
fYear :
2007
fDate :
22-24 Aug. 2007
Firstpage :
9
Lastpage :
14
Abstract :
This paper proposed a new document categorization model using the methods of latent semantic analysis (LSA) and back-propagation neural network (BPNN). In traditional word-matching based document categorization system, the most popular and straightforward approach to represent the document is vector space model (VSM). However, this approach has drawbacks. Firstly, because it needs a large number of features to represent the documents, so the dimensionality is very high. Secondly, it dose not take into account the effects of synonymy and polysemy, which could have an impact on classification accuracy. Latent Semantic Analysis (LSA) can overcome the problems by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. Introduced the latent semantic analysis to our model could not only greatly reduce the dimensionality but also discover the important associative relationships between terms. It also helps to accelerate the training speed and improve the classification accuracy. We test our categorization model on the standard Reuter collection, experimental evaluations show that the model with LSA can lead to dramatic dimensionality reduction while achieving good classification results.
Keywords :
Acceleration; Information analysis; Information technology; Neural networks; Ontologies; Semantic Web; Support vector machine classification; Support vector machines; Testing; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on
Conference_Location :
Luoyang, Henan, China
Print_ISBN :
978-0-7695-2930-1
Type :
conf
DOI :
10.1109/ALPIT.2007.88
Filename :
4460607
Link To Document :
بازگشت