DocumentCode :
2354304
Title :
Classifying Text with Statistically Selected Features to Closely Related Categories
Author :
Meena, M. Janaki ; Chandran, K.R.
Author_Institution :
Dept. of Comput. Sci. & Eng., PSG Coll. of Technol., Coimbatore, India
fYear :
2009
fDate :
27-28 Oct. 2009
Firstpage :
297
Lastpage :
301
Abstract :
Text classification is continuing to be one of the most researched problems due to continuously-increasing amount of electronic documents and digital data. Classifying documents to closely related categories is the most complex task in text categorization. Feature selection is an essential preprocessing step for improving the efficiency and accuracy of the text classifiers by removing redundant and irrelevant terms from the training corpus. In this paper, a novel feature selection algorithm based on chi-square statistics, have been proposed for naive Bayes classifier. The proposed feature selection method not only identifies the related features for a class, but also determines the type of dependency between the feature and category. The performance of the classifier with the features selected by the proposed method and the features selected by conventional chi-square max method are compared for closely related categories. Experiments were conducted with randomly chosen training documents from six closely related categories of 20Newsgroup Benchmarks. Experimental results show that the classifier has better classifying accuracy with positive features selected by the proposed method.
Keywords :
Bayes methods; classification; feature extraction; statistical analysis; text analysis; chi-square max method; chi-square statistics; digital data; document classification; electronic document; feature selection; naive Bayes classifier; text categorization; text classification; Communications technology; Computer science; Data engineering; Educational institutions; Information filtering; Information filters; Information technology; Statistics; Supervised learning; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Recent Technologies in Communication and Computing, 2009. ARTCom '09. International Conference on
Conference_Location :
Kottayam, Kerala
Print_ISBN :
978-1-4244-5104-3
Electronic_ISBN :
978-0-7695-3845-7
Type :
conf
DOI :
10.1109/ARTCom.2009.67
Filename :
5329463
Link To Document :
بازگشت