DocumentCode :
2983548
Title :
Inductive Model Generation for Text Categorization Using a Bipartite Heterogeneous Network
Author :
Rossi, Rafael G. ; de Paulo Faleiros, T. ; De Andrade Lopes, Alneu ; Rezende, Solange O.
Author_Institution :
Univ. of Sao Paulo, Sao Carlos, Brazil
fYear :
2012
fDate :
10-13 Dec. 2012
Firstpage :
1086
Lastpage :
1091
Abstract :
Usually, algorithms for categorization of numeric data have been applied for text categorization after a preprocessing phase which assigns weights for textual terms deemed as attributes. However, due to characteristics of textual data, some algorithms for data categorization are not efficient for text categorization. Characteristics of textual data such as sparsity and high dimensionality sometimes impair the quality of general purpose classifiers. Here, we propose a text classifier based on a bipartite heterogeneous network used to represent textual document collections. Such algorithm induces a classification model assigning weights to objects that represents terms of the textual document collection. The induced weights correspond to the influence of the terms in the classification of documents they appear. The least-mean-square algorithm is used in the inductive process. Empirical evaluation using a large amount of textual document collections shows that the proposed IMBHN algorithm produces significantly better results than the k-NN, C4.5, SVM and Naïve Bayes algorithms.
Keywords :
least mean squares methods; network theory (graphs); pattern classification; text analysis; C4.5 algorithm; IMBHN algorithm; Naive Bayes algorithm; SVM algorithm; bipartite heterogeneous network; classification model; dimensionality characteristics; general purpose classifier; inductive model generation; k-NN algorithm; k-nearest neighbor; least mean square algorithm; numeric data categorization algorithm; sparsity characteristics; support vector machines; text categorization; textual data characteristics; textual document collection; textual term; Accuracy; Data models; Equations; Mathematical model; Niobium; Training; Vectors; Heterogeneous Network; Text Categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
ISSN :
1550-4786
Print_ISBN :
978-1-4673-4649-8
Type :
conf
DOI :
10.1109/ICDM.2012.130
Filename :
6413804
Link To Document :
بازگشت