DocumentCode :
2780009
Title :
MFCC and ARM algorithms for text categorization
Author :
Srinivas, M. ; Spreethi, K.P. ; Prasad, E.V. ; Kumari, Anitha S.
Author_Institution :
CSE, JNTUCE, Anantapur
fYear :
2008
fDate :
18-20 Dec. 2008
Firstpage :
1
Lastpage :
6
Abstract :
Text categorization is continuing to be one of the most researched NLP problems due to the ever-increasing amounts of electronic documents and digital libraries. In this paper, we present a novel text categorization method that combines the multitype features coselection for clustering and Association rule mining, for constructing text classifiers. The high dimensionality of text in a document has not been fruitful for the task of categorization, for which reason, feature clustering has been proven to be an ideal alternative to feature selection for reducing the dimensionality. We, therefore, use multitype features coselection for clustering (MFCC) to generate an efficient representation of documents and apply Association rule mining for training text classifiers. The method was extensively tested and evaluated. The proposed method achieves higher or comparable classification accuracy and F1 results compared with Decision tree. MFCC improves clustering performance.
Keywords :
classification; data mining; feature extraction; pattern clustering; text analysis; NLP problem; association rule mining algorithm; digital library; document representation; electronic document; feature clustering; multitype feature coselection; text categorization; text classifier training; Association rules; Bayesian methods; Classification tree analysis; Data mining; Decision trees; Mel frequency cepstral coefficient; Software libraries; Support vector machines; Text categorization; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing, Communication and Networking, 2008. ICCCn 2008. International Conference on
Conference_Location :
St. Thomas, VI
Print_ISBN :
978-1-4244-3594-4
Electronic_ISBN :
978-1-4244-3595-1
Type :
conf
DOI :
10.1109/ICCCNET.2008.4787780
Filename :
4787780
Link To Document :
بازگشت