DocumentCode :
2187038
Title :
A novel feature selection based on information gain using WordNet
Author :
Patil, L.H. ; Atique, Mohammad
Author_Institution :
Dept. of Comput. Sci. & Eng., Sant Gadge Baba Amravati Univ., Amravati, India
fYear :
2013
fDate :
7-9 Oct. 2013
Firstpage :
625
Lastpage :
629
Abstract :
Text Classification and Feature Selection are mostly used for organizing the text document in the digitized form. Due to the enormous increased in number of documents in digitized form, text categorization and feature selection approach becomes most important and promising in the last decades. The problem arises for text categorization is the large number of features available in the document. Now a day, with the rapid speed in development of web large numbers of documents are available on the internet. Digital libraries, news article, magazines, World Wide Web and large companies surge more and more. However the major problem is the high dimensionality of data. Most of the features are noisy, redundant which misleads for classifier. Therefore to reduce the dimensionality of the feature space and improve the performance and accuracy of text categorization, feature selection becomes an important parameter. In this paper we discuss how document preprocessing and feature selection approaches are useful for dimensionality reduction. To deal with the massive amount of data, document preprocessing and feature selection methods become an important parameter for dimensionality reduction. Until now large number of methods deals with feature selection approaches. Still the problems arise of attribute reduction, therefore to improve the performance a hybrid method has been designed for feature selection. Our approaches is on document preprocessing and selects the important feature with information gain based on entropy and calculate the probability of class to reduce the high dimensionality of the feature.
Keywords :
Internet; database management systems; entropy; natural language processing; probability; software performance evaluation; text analysis; Internet; Web development; World Wide Web; digital libraries; document preprocessing; entropy; feature selection methods; feature space dimensionality reduction; high data dimensionality; information gain; magazines; news article; probability calculation; text categorization accuracy improvement; text categorization performance improvement; text classification; text document; Computer science; Databases; Entropy; Feature extraction; Knowledge discovery; Text categorization; Introduction; document preprocessing; feature selection approaches; information gain;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Science and Information Conference (SAI), 2013
Conference_Location :
London
Type :
conf
Filename :
6661804
Link To Document :
بازگشت