Title :
Automatic text categorization: case study
Author :
Corrêa, Renato Fernandes ; Ludermir, Teresa Bernarda
Abstract :
Text categorization is a process of classifying documents with regard to a group of one or more existent categories according to themes or concepts present in their contents. The most common application of it is in information retrieval systems (IRS) to document indexing. A method to transform text categorization into a viable task is to use machine-learning algorithms to automate text classification, allowing it to be carried out fast, into concise manner and in broad range. The objective of this work is to present and compare the results of experiments on text categorization using artificial neural networks of multilayer perceptron and self-organizing map types, and traditional machine-learning algorithms used in this task: C4.5 decision tree, PART decision rules and Naive Bayes classifier.
Keywords :
classification; decision trees; information retrieval; learning (artificial intelligence); multilayer perceptrons; self-organising feature maps; text analysis; Naive Bayes classifier; PART decision rules; decision tree; document classification; document indexing; information retrieval systems; machine-learning; multilayer perceptron; neural networks; self-organizing maps; text categorization; Artificial neural networks; Computer aided software engineering; Decision trees; Electronic mail; Indexing; Information retrieval; Machine learning algorithms; Multilayer perceptrons; Self organizing feature maps; Text categorization;
Conference_Titel :
Neural Networks, 2002. SBRN 2002. Proceedings. VII Brazilian Symposium on
Print_ISBN :
0-7695-1709-9
DOI :
10.1109/SBRN.2002.1181457