Title :
MFCC and ARM algorithms for text categorization
Author :
Srinivas, M. ; Spreethi, K.P. ; Prasad, E.V. ; Kumari, Anitha S.
Author_Institution :
CSE, JNTUCE, Anantapur
Abstract :
Text categorization is continuing to be one of the most researched NLP problems due to the ever-increasing amounts of electronic documents and digital libraries. In this paper, we present a novel text categorization method that combines the multitype features coselection for clustering and Association rule mining, for constructing text classifiers. The high dimensionality of text in a document has not been fruitful for the task of categorization, for which reason, feature clustering has been proven to be an ideal alternative to feature selection for reducing the dimensionality. We, therefore, use multitype features coselection for clustering (MFCC) to generate an efficient representation of documents and apply Association rule mining for training text classifiers. The method was extensively tested and evaluated. The proposed method achieves higher or comparable classification accuracy and F1 results compared with Decision tree. MFCC improves clustering performance.
Keywords :
classification; data mining; feature extraction; pattern clustering; text analysis; NLP problem; association rule mining algorithm; digital library; document representation; electronic document; feature clustering; multitype feature coselection; text categorization; text classifier training; Association rules; Bayesian methods; Classification tree analysis; Data mining; Decision trees; Mel frequency cepstral coefficient; Software libraries; Support vector machines; Text categorization; Training data;
Conference_Titel :
Computing, Communication and Networking, 2008. ICCCn 2008. International Conference on
Conference_Location :
St. Thomas, VI
Print_ISBN :
978-1-4244-3594-4
Electronic_ISBN :
978-1-4244-3595-1
DOI :
10.1109/ICCCNET.2008.4787780