Title :
Automatic documents classification
Author :
Mohamed, Hoda K.
Author_Institution :
Fac. of Eng., Cairo
Abstract :
Automatic document classification is of paramount importance to knowledge management in the information age. Document classification poses many challenges for learning systems since the feature vector used to represent a document must capture some of the complex semantics of natural language. In this paper, we design an automatic document classification system. We investigate the different parameters and design decisions that affect the building of automatic classifiers. The system creates an item vector for each document retrieved and assigns weights for each item. The vectors are selected using combined techniques from stemmer algorithm and natural language processing. Several weighting schema have been used. Documents are classified using neural network (NN). We investigate different cases applied to the NN classifier. Cases are classified according to weighting schema, effect of weighting words in the title, and the number of inputs to the classifier. Analyzing the performance of the classifier according to different cases is illustrated.
Keywords :
classification; information retrieval; learning (artificial intelligence); natural language processing; neural nets; text analysis; automatic document classification; document retrieval; feature vector; knowledge management; learning system; natural language processing; neural network; stemmer algorithm; text analysis; Frequency; Humans; Information retrieval; Machine assisted indexing; Machine learning algorithms; Natural language processing; Natural languages; Neural networks; Text analysis; Text categorization; information retrieve; natural language processing and neural networks; stemmer algorithm; text classification;
Conference_Titel :
Computer Engineering & Systems, 2007. ICCES '07. International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-1365-2
Electronic_ISBN :
978-1-1244-1366-9
DOI :
10.1109/ICCES.2007.4447022