Automatic text categorization: case study

Author

Corrêa, Renato Fernandes ; Ludermir, Teresa Bernarda

fYear

2002

fDate

2002

Firstpage

150

Abstract

Text categorization is a process of classifying documents with regard to a group of one or more existent categories according to themes or concepts present in their contents. The most common application of it is in information retrieval systems (IRS) to document indexing. A method to transform text categorization into a viable task is to use machine-learning algorithms to automate text classification, allowing it to be carried out fast, into concise manner and in broad range. The objective of this work is to present and compare the results of experiments on text categorization using artificial neural networks of multilayer perceptron and self-organizing map types, and traditional machine-learning algorithms used in this task: C4.5 decision tree, PART decision rules and Naive Bayes classifier.

Keywords

classification; decision trees; information retrieval; learning (artificial intelligence); multilayer perceptrons; self-organising feature maps; text analysis; Naive Bayes classifier; PART decision rules; decision tree; document classification; document indexing; information retrieval systems; machine-learning; multilayer perceptron; neural networks; self-organizing maps; text categorization; Artificial neural networks; Computer aided software engineering; Decision trees; Electronic mail; Indexing; Information retrieval; Machine learning algorithms; Multilayer perceptrons; Self organizing feature maps; Text categorization;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks, 2002. SBRN 2002. Proceedings. VII Brazilian Symposium on

Print_ISBN

0-7695-1709-9

Type

conf

DOI

10.1109/SBRN.2002.1181457

Filename

1181457