DocumentCode
2947483
Title
Automatic documents classification
Author
Mohamed, Hoda K.
Author_Institution
Fac. of Eng., Cairo
fYear
2007
fDate
27-29 Nov. 2007
Firstpage
33
Lastpage
37
Abstract
Automatic document classification is of paramount importance to knowledge management in the information age. Document classification poses many challenges for learning systems since the feature vector used to represent a document must capture some of the complex semantics of natural language. In this paper, we design an automatic document classification system. We investigate the different parameters and design decisions that affect the building of automatic classifiers. The system creates an item vector for each document retrieved and assigns weights for each item. The vectors are selected using combined techniques from stemmer algorithm and natural language processing. Several weighting schema have been used. Documents are classified using neural network (NN). We investigate different cases applied to the NN classifier. Cases are classified according to weighting schema, effect of weighting words in the title, and the number of inputs to the classifier. Analyzing the performance of the classifier according to different cases is illustrated.
Keywords
classification; information retrieval; learning (artificial intelligence); natural language processing; neural nets; text analysis; automatic document classification; document retrieval; feature vector; knowledge management; learning system; natural language processing; neural network; stemmer algorithm; text analysis; Frequency; Humans; Information retrieval; Machine assisted indexing; Machine learning algorithms; Natural language processing; Natural languages; Neural networks; Text analysis; Text categorization; information retrieve; natural language processing and neural networks; stemmer algorithm; text classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Engineering & Systems, 2007. ICCES '07. International Conference on
Conference_Location
Cairo
Print_ISBN
978-1-4244-1365-2
Electronic_ISBN
978-1-1244-1366-9
Type
conf
DOI
10.1109/ICCES.2007.4447022
Filename
4447022
Link To Document